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Introduction 


DEVtools is the software development toolkit for the Pixel Machine. DEVtools differs from the 
other Pixel Machine libraries (PIClib, RAYIib, etc.) in that DEVtools users program both the host 
and the processors in the Pixel Machine. Users of the other libraries, however, program only the 
host system—the Pixel Machine functions are performed implicitly by the libraries supplied with the 
Pixel Machine. 


DEVtools enables users to implement a wide variety of applications that take advantage of the 
graphics and compute power of the Pixel Machine. Consequently, DEVtools users require a deeper 
understanding of the Pixel Machine architecture and the DSP32 processor than do users of the other 
libraries. 

DEVtools is designed to provide a high level programming model for the Pixel Machine. This 
model enables users to quickly develop Pixel Machine applications without having to know or 
understand the details of the inner workings of the Pixel Machine. However, DEVtools does sup- 
ply the detailed information for those users with special needs that may require lower level access to 
the Pixel Machine hardware. 


DEVtools comprises the following: 
m DSP32 C compiler, assembler, linker, library and miscellaneous other utilities 
m a library of host functions that control and communicate with the Pixel Machine 
m a library of Pixel Machine functions that are used to program the pipe and pixel nodes 


Components of a Typical DEVtools Application 


A typical DEVtools application consists of a host program, pipe node programs and a pixel node 
program. 


The host program serves as the controller or master of the application. The application is initiated 
by invoking the host program in the same manner as any other host program. The host program, 
through the use of DEVtools function calls, loads the Pixel Machine executable files into the pipe 
and pixel nodes and initiates execution. Once execution has begun the host is responsible for send- 
ing data and commands to the Pixel Machine, and for servicing message requests for the Pixel 
Machine to perform operations such as input/output. 


Pipe node programs are used to perform transformations on the data produced by the host before the 
data is sent to the pixel nodes. Many DEVtools applications do not require use of the pipe. The 
Pixel Machine can be configured without a pipe for users with no need for pipeline processing. 
When an application does not use the pipe but is rin on a system equipped with a pipe, a program 
must be loaded into the pipe that passes the data through the pipeline to the pixel nodes. DEVtools 
includes a pipe program that performs this function. Applications that do make use of the pipe can 
load a different pipe program into each pipe node or they can load the same program into every 
node. DEVtools includes functions to read and write command information, send messages to the 
host system, control access to the pixel broadcast bus, and to send data to the host feedback FIFO. 
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Pixel node programs are typically the core of the application. The same program is usually loaded 
into all pixel nodes, although this does not have to be the case. Pixel node programs read com- 
mands from either the host or pipe, process the command, and produce results in the distributed 
frame buffer. Applications that produce non-graphical results can send the data back to the host for 
storage or output. Applications that use data distributed among the pixel nodes can use the serial 
Y/O communications facility for interprocessor communication. DEVtools includes functions to read 
commands, send messages to the host, perform frame buffer I/O, serial I/O, memory management, 
processor synchronization, etc. 


Pipe and pixel node programs are created in much the same manner as would be used to create host 
executables, with the exception that the command devec is used in place of cc. 
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Introduction 


Generating a realistic image from complex two and three dimensional data in real time demands a 
lot of computational power. Graphics and image processing algorithms, particularly rendering algo- 
rithms, often perform a set of operations to generate each pixel, with little or no interaction between 
pixels. These algorithms are candidates for mapping to a parallel architecture, with performance 
increasing nearly linearly with the number of processors. 


In many display systems, a single custom processor handles the typical frame buffer operations. 
This approach is adequate for rendering simple two-dimensional images. However, when realisti- 
cally shaded images must be displayed in real time, a single processor cannot provide the necessary 
computational power. 


Pipelines or arrays of special purpose processors provide high performance at the expense of flexi- 
bility. Their performance improvements are limited to the narrow range of algorithms that they 
were designed to solve. 


While the use of parallelism and pipelining gives a system the power needed to render high quality 
images in real time, the use of programmable processors provides the flexibility to attain high per- 
formance for a wide range of graphics and image procession algorithms. It is much easier to change 
a program from Gouraud shading to Phong shading, for example, than to redesign a customized pro- 
cessor. 


The AT&T Pixel Machine combines the strengths of both coarse grain pipelining and multiple 
instruction/multiple datapath (MIMD) computing arrays. A pipeline of computing elements 
processes the serial tasks that precede pixel-level processing while a processor array provides high- 
bandwidth access to an integrated frame buffer and computes individual pixel values. The proces- 
sors in both the pipeline and the array are programmable, with hardware floating point operations. 


The programmability of the processors allows all algorithms to be implemented in software. A set 
of mapping functions transfer frame buffer algorithms written for conventional serial computers to 
algorithms that execute in the pixel nodes and access the distributed frame buffer. The ability to 
use floating point computations in frame buffer operations such as antialiasing, ray tracing, and cas- 
caded filtering, allows high quality image generation. 


The Pixel Machine provides up to 820 megaflops of processing power and 48 megabytes of memory 
for data visualization applications, including three-dimensional rendering and animation, image pro- 
cessing, and display of multi-dimensional data. 
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Design 


The Pixel Machine combines the strengths of both coarse grain pipelining and MIMD computing 
arrays to provide the performance of a supercomputer on image synthesis and image analysis appli- 
cations. Synthesis applications include the generation and display of two and three dimensional 
scenes as well as the visualization of scientific and engineering computations. Analysis applications 
include the processing and interpretation of image data from, say, a nuclear magnetic resonance 
machine or a satellite. The design philosophy is: 


m Use floating point computation and large image memories, which are useful for image pro- 
cessing. 


m Design simple, modular processors that can be repeated a number of times to build a system. 
u Implement all algorithms in software. 
The modular approach enabled the Pixel Machine to be designed, built, and programmed by a small 
group of people in a short period of time. The decision to implement algorithms in software rather 


than special purpose VLSI chips gives wide functionality, faster implementation of new algorithms, 
and easier modification of existing ones. 


The architecture has the following features: 
a AT&T DSP32 processors 
0, 9 or 18 pipe nodes, configurable as zero, one or two pipelines 
16, 20, 32, 40, or 64 pixel nodes 
32~bit pixel and z-buffer data 
floating-point computation for pixel generation 
a frame buffer with pixel-interleaved parallel architecture 
1280x1024 or 1024x1024 high-resolution 60 Hz non-interlaced display 
NTSC and PAL display modes 
a large image memory that allows single, double, or quadruple buffering 


software that transparently handles the different frame buffer sizes 
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Figure 1-1: Pixel Machine block diagram 
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Both the pipe nodes and the pixel nodes include an AT&T DSP32 Digital Signal Processor, a 
32-bit, high speed, programmable device whose features include: ~ 


= 20 MHz, 5 MIPS, 10 MFLOPS 

4K bytes of on-chip memory 

32-bit floating point arithmetic 

four 40-bit floating point registers 

twenty-one 16-bit integer and address registers 
an interface to off-chip expansion memory 
parallel and serial I/O ports with DMA 


The DSP32 can be programmed in assembler language or in C. The software development environ- 
ment includes a compiler, an assembler, a linking loader, and a simulator. All arithmetic operations 
on data are floating point operations. Only memory address generation and program control calcula- 
tions use integer arithmetic. Software is developed on a host computer, typically a SUN or SGI 
workstation. The Pixel Machine is connected to the host computer via the VMEbus. 


Inside the Pixel Machine, there are 0, 9 or 18 pipe nodes configured as zero, one or two pipelines, a 
broadcast bus that transfers data from the end of the pipes to the pixel nodes, an array of 16, 20, 32, 
40, or 64 pixel nodes that form a distributed frame buffer, and a pixel funnel that transfers digital 
video data from the frame buffer to the video processor, which controls the display monitor. 
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Pixel Machine Architecture 





Each pipe and pixel node can be viewed as a small independent computer that executes its instruc- 
tions and operates on data asynchronously with all the other nodes. Programs are loaded into the 
nodes by the host, using unique, software-defined node numbers to distinquish between them. 
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Figure 1-2 shows a block diagram for a pipe node. Each pipe node has a DSP32 processor that 
executes five million instructions or ten million floating point operations per second. The parallel 
DMA interface of each processor is connected to the VMEbus. The pipe nodes have 9K x32 bits of 
memory for instructions and data, a 512x32 bit input FIFO containing data written by the previous 
pipe node, a 512x32 bit output FIFO where all output is written, to be read by the next node in the 
pipeline. 


Figure 1-2: Pipe node block diagram 
51232 [51232 


FIFO Bee FIFO 
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The host computer provides input to the first pipe node via the VMEbus. The output from the last 
pipe node is broadcast to all of the pixel nodes. In addition, the last pipe node has a second output 
FIFO that can be read by the host, again via the VMEbus. 


A system can have 0, 9 or 18 pipe nodes. The 18-node systems are software-configurable as either 
two nine-node parallel pipelines or one 18-node pipeline (see Figure 1-3). In a two pipeline system, 
the node in each pipeline has the ability to request, acquire, and release the broadcast bus. In the 
one pipeline system, the last node has continuous access to the broadcast bus. 
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Figure 1-3: Pipeline configurations 
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The pipe nodes perform those parts of the algorithms that are serial in nature and can be pipelined. 
These include 3D transformations, clipping, projections, shading, and image filtering. The pipeline 
can also be used as a hardware subroutine by processes running in the host computer, which can 
send data to the first node and read results from the last one. 
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Pipe Node Memory Areas 


This section describes the memory areas and their use within pipe node programs. Direct use of 
these memory areas and flags is discouraged because the addresses of the areas and other dependen- 
cies, such as timing requirements, are considered to be implementation defined and may be different 
in future systems. When it is necessary to access these memory areas, the symbolic names given 
below and defined in the header file pipe.h should be used. 
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Table 1-1: Memory Map of a Pipe Node’s Address Space 






Address Name Mode 










Description 


0000 — 0060 
0060 — 7fff 


crt0 (startup code) 
static RAM for program and data storage 


















c000 — clff Input FIFO 
c000 PM_FIFOIN Input FIFO 
c000 PM_FIFOIN_L Input FIFO — low word 
c002 PM_FIFOIN_H Input FIFO — high word 






























e200 - c3ff Output FIFO 
c200 PM_FIFOOUT Output FIFO 
c200 PM_FIFOOUT_L Output FIFO — low word 
c202 PM_FIFOOUT_H Output FIFO — high word 






















PM_EMPTY_IN Input FIFO empty flag 
PM_HALF_IN Input FIFO half-full flag 
PM_HALF_OUT Output FIFO half-full flag 
PM_FULL_OUT Output FIFO full flag 
on-chip ROM (unusable) 











static RAM for program and data storage 
Last Node on a Board 








PM_BUS_RELEASE Broadcast bus release 















Feedback FIFO 
PM_FIFOFB Feedback FIFO 
PM_FIFOFB_L Feedback FIFO — low word 
PM_FIFOFB_H Feedback FIFO — high word 











PM_HALF_FB 
PM_FULL_FB 
PM_BUS GRANT 
PM_BUS_REQUEST 
PM_PIXEL_ALLRDY 
PM_PIXEL_XFLAG 


Feedback FIFO half-full flag 
Feedback FIFO full flag 
Broadcast bus grant 
Broadcast bus request 

Pixel node vsync flag 
Pixel node psync flag 
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Pipe Nodes 


The mode field in the memory map defines whether the address can be read (R), written (W), or 
both (R/W). Memory areas that are not defined must not be referenced. 


Sync Signal Selectors 


The flags in the memory map designated with an asterisk (‘‘*’’) are boolean values whose state may 
be sensed by the DSP32 conditions sys and syc (sync set and sync clear). The values of each of 
these flags is routed through a multiplexer. The value to be passed to the DSP32 is selected by 
writing any value into the address associated with the flag to be sensed. For example, to check the 
input FIFO half-full flag, you would write a value to the address c600 (PM_HALF_IN), then check 
the sync signal. If the signal is set, then the condition is true; the FIFO is half full. 


The sync signal must not be tested immediately after setting one of the signal selector flags because 
a short delay is required for the hardware to switch the signals. The minimum delay is three 
instructions for the last node on a board and two instruction for all of the other nodes. 


Static RAM 


The static RAM area totals 36k bytes of memory for general purpose program and data storage. 
The standard memory definition file (ifile) designates 0060 through 7fff for program storage and 
f000 through ffff for data storage. This can be changed by supplying an jfile to the linker that dis- 
tributes the memory in the manner desired. 


Input FIFO 


The input FIFO contains up to 2048 bytes of data, organized as 512 units of four bytes each. The 
input FIFO may be read as one four-byte word, two 2-byte words or as four bytes, however, all four 
bytes of each FIFO entry must always be read in order for the contents of each byte of the FIFO to 
remain synchronized with the others. The status of the input FIFO is checked by writing a value to 
input empty or input half-full, then checking the sync flags (sys or syc). The FIFO must not be 
read when it is empty. 


The FIFO can be read using any of the 32 bit 4-byte words that are mapped to the output port of 
the FIFO. This allows a program to use a standard block move routine to read to the FIFO as long 
as no more than 128 4-byte words are moved at one time. 


Output FIFO 


The output FIFO is the input FIFO of the next pipe node. For the last node in the pipeline, the out- 
put FIFO is the broadcast bus to the input FIFO’s of the pixel nodes. As with the other FIFOs, all 
four words must always be written in order to maintain synchronization. The status of the output 
FIFO is checked by writing a value to output full or output half-full, then checking the sync flags 
(sys or syc). The FIFO must not be written when it is full. The FIFO of the last node (the broad- 
cast bus) must not be written unless the bus is granted to the board that wishes to do the write 
operation. 
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The FIFO can be written using any of the 32 bit 4-byte words that are mapped to the output port of 
the FIFO. This allows a program to use a standard block move routine to write to the FIFO as long 
as no more than 128 4-byte words are moved at one time. 


Feedback FIFO 


Each pipe board contains a feedback FIFO that can be read by the host system. Both feedback 
FIFOs can be used regardless of whether the pipes are operating in serial or parallel mode. As with 
the other FIFOs, all four words must always be written in order to maintain synchronization. The 
status of the feedback FIFO is checked by writing a value to feedback full or feedback half-full, then 
checking the sync flags (sys or syc). The FIFO must not be written when it is full. 


The FIFO can be written using any of the 32 bit 4-byte words that are mapped to the output port of 
the FIFO. This allows a program to use a standard block move routine to write to the FIFO as long 
as no more than 128 4-byte words are moved at one time. 


Accessing the Broadcast Bus 


In a dual-pipe system with the pipes operating in parallel mode only one pipe has access to the 
broadcast bus at any point in time. When operating in parallel, each pipe must release access to the 
broadcast bus to the other pipe on a regular basis, because both pipes must be able to write to the 
broadcast bus in order to achieve optimal performance. 


Access to the broadcast bus is controlled by three memory locations: PM_BUS_REQUEST, 
PM_BUS_ RELEASE, and PM_BUS_GRANT. PM_BUS_REQUEST is used to request access to 
the broadcast bus. PM_BUS_RELEASE is used to relinquish control of the broadcast bus to the 
other pipe. PM_BUS_ GRANT is used to connect the bus grant signal to the sync signal of the 
DSP32 so that the software can sense whether access to the bus has been granted. 


*PM_BUS_REQUEST ~* rl /* Can be any register-the contents don’t matter */ 
*PM_BUS_GRANT = x1 
2*nop /* Delay needed before signal can read */ 


loop: ¥ 
if (syc) goto loop /* Wait for grant signal to be true */ 
nop 





To gain access to the bus 
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*PM_BUS RELEASE = ri /* Can be any register - the contents don’t matter */ 





To release the broadcast bus 


Single pipe configurations and dual pipe configurations operating in a series do not need to check 
the bus grant flag because the bus will always be granted to the node that has access to the broad- 
cast bus. 


Pixel Node Flags 

Pipe activity can be synchronized with pixel node activity by using the PM_PIXEL_ALLRDY and 
PM_PIXEL_XFLAG signals. The PM_PIXEL_ALLRDY flag is true when all of the pixel nodes 
have set their vsync signals. PM_PIXEL_XFLAG is true when all of the pixel nodes have set their 
psync signals. These signals are available to both pipes of a dual pipe system in both serial and 
parallel modes. 


The Pixel Machine System Architecture 1-13 





Pixel Nodes 


The pixel nodes form an n xm array with a distributed frame buffer. They receive their data from 
the broadcast bus of the pipe nodes and store their output into the frame buffer or return it to the 
host computer. Mapping registers provide uniform access to the frame buffer across different 
configurations of pixel nodes, and a four-way multiplexed I/O switch and channel allows two-way 
communication with the four neighboring pixel nodes. 


A DSP32 is the computing element in each pixel node. Just as in the pipe nodes, there is an 
8192x32 bit static RAM in the node, in addition to the 1024x32 bits of on-chip storage. Figure 1-4 
shows a block diagram. 


Figure 1-4: Pixel node block diagram 


9216x32| 64Kx32 
SRAM | VRAM 
64Kx32 | 64Kx32 
DRAM | VRAM 
















Two banks of 64Kx32 bit VRAMs form the pixel node’s piece of the the distributed frame buffer. 
The video RAMs store the red, green, blue, and alpha settings for the pixels. These memories can 
be displayed or used as off-screen storage for images. 


Pixel nodes also contain a 64Kx32 bit dynamic RAM that can be used to hold floating point z- 
buffer values or any data in byte or word format, pixels in floating point representation, sections of 
display list, or code segments. The processor can execute instructions from this memory, although 
it is slower than executing from either the on-chip or SRAM memory. 
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Table 1-2: Pixel array configurations 


Pixel Pixel Node Array Display Subscreens 
nose physical virtual | Resolution Size # pixels per node 


































1024x1024 | 128x128 4 
1280x1024 | 128x128 65536 4 
1024x1024 | 128x128 32768 2 
1280x1024 | 128x128 32768 2 
1024x1024 | 128x128 1 
1280x1024 | 160x128 1 





A Pixel Machine can be configured with 16, 20, 32, 40, or 64 pixel nodes. Table 1-2 summarizes 
these five configurations. The two video RAMs and the dynamic RAM (when used to store pixel 
data) are organized as three blocks of 256x256 32-bit pixels. Each bank can be logically divided 
further into smaller blocks, called subscreens (see Chapter 3 for a discussion on subscreens). 


In order to allow configuration-independent software, the concept of virtual pixel nodes that reside 
inside physical nodes is introduced. Each virtual node accesses a single subscreen. All systems 
have either 64 or 80 virtual nodes, depending on the resolution of the display screen. In a 64—node 
system, each physical pixel node contains a single virtual node, while a 16— or 20—node system has 
four virtual nodes per physical node. 
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Figure 1-5: Pixel format 
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Pixel data is stored in the frame buffer as 16-bit signed integers. The four components of a pixel, 
(red, green, blue, a) form two 32-bit words, as shown in Figure 1-5. Only eight of the 16 bits in a 
pixel component are populated with memory. 


Each pixel node has a serial input/output (SIO) channel that provides a communication path to its 
four nearest neighbors, allowing the Pixel Machine to function as a computing mesh. Node place- 
ment follows pixel interleaving conventions, as shown in Figure 1-6. Thus, in a 4x4 array of pixel 
nodes, node 5’s neighbors are nodes 1, 4, 6, and 9. The edges of the mesh wrap around to form a 
torus, so node 0’s neighbors are 1, 3, 4, and 12, for example. 


The SIO capability at each node consists of one input and one output serial port that operates at 
peak rates of 16Mbits4ec. Pixel data can be moved from node to node at a sustained rate of 
5.25Mbits/ec , including the time spent buffering pixel data to and from the display memory. In 
practice, however, processor cycles will be shared between an application program and SIO, and the 
data transfer rate will be proportionately slower. 
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Pixel Nodes 


This section describes the memory areas and their use within pixel node programs. Direct use of 
these memory areas and flags is discouraged because the addresses of the areas and other dependen- 
cies, such as timing requirements, are considered to be implementation defined and may be different 
for future systems. When it is necessary to access these memory areas, the symbolic names given 


above and defined in the header file pixel.h should be used. 
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Address Name Mode 
0000 - 0060 R/W crt0 (startup code) 
0060 — 7fff R/W static RAM for program and data storage 
8000 — bfff | PM_PIXEL_MEM | R/W reserved for VRAM/ZRAM access via 
page registers 
8000 — 83ff R/W memory reference via page register 0 
8400 - 87ff R/W memory reference via page register 1 
8800 — 8bff R/W memory reference via page register 2 
R/W memory reference via page register 15 
c800 — c840 | PM_MAP_ADDR | R/W page register storage 
c800 — c803 R/W page register 0 
c804 — c807 R/W page register 1 
c808 — c80b R/W page register 2 
c83c — c840 R/W page register 15 
d002 - d003 | PM_FLAG_REG drawing mode register 
d800 — dfff Input FIFO 
d800 — d803 | PM_FIFO_32 R Input FIFO 
d800 - d801 | PM_FIFO_16L R Input FIFO — low word 
d802 —- d803 | PM_FIFO_16H R Input FIFO — high word 
e000 — efff on-chip ROM (unusable) 
f000 — ffff R/W static RAM for program and data storage 
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The mode field in the memory map defines whether the address can be read (R), written (W), or 
both (R/W). Memory areas that are not defined must not be referenced. 


Static RAM 
The static RAM area totals 36k bytes of memory for general purpose program and data storage. 
The standard memory definition file (ifile) designates 0060 through 7fff for program storage and 
f000 through ffff for data storage. This can be changed by supplying an jfile to the linker that dis- 
tributes the memory in the manner desired. 
Flag Register 
Each pixel node has a flag register that contains: 

m the sync signal section flags 

m the node’s psync and vsync flags 

@ the flags that select the video buffer to be displayed 

m the overlay flag for the processor 


The flag register is 16—bits and is accessed through the address d002 (PM_FLAG_REG). The fol- 
lowing describes the structure of the register: 


ovlv0f rsss 


where: 


@ o is the overlay flag. For overlaying to be enabled, the overlay flag of all the drawing nodes 
must be set, and the overlay flag for all the pixel board mode registers (see below) must also 
be set. 


g v0 and v1 designate which area of the video memory should be displayed. v0 controls 
whether the top buffer or bottom buffer is displayed (0 displays the top buffer). If v1 is 
false, the image is displayed starting at the first pixel of image memory. If v1 is true, an 
offset of 128 pixels is used. 


w fis the processor’s psync flag. 


m + is the processor’s vsync flag. 
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m sss is the sync signal selection flag. The sync signal selection flag is used to designate which 
of several flags is to be connected to the DSP32 sync signal. Once selected these signal may 
be sensed by the DSP32 conditions, sys and syc (sync set and syc clear). The value of sss 
must be one of the following: 


o 000 (PM_EMPTY_LOWER): input FIFO 16—bits (not usually used) 
o 001 (PM_HALF FULL): lower 16—bits (not usually used) 
o 010 (PM_DRAW_EMPTY): input FIFO empty flag 
o 011 (PM_DRAW_HALF): input FIFO half-full flag 
o 100 (PM_VERTBLNK): vertical blanking flag 
a 101 (PM_HORZBLNK): horizontal blanking flag 
a 110 (PM_XFLAG): all processors vsync flags are set 
o 111 (PM_ALLRDY); all processors psync flags are set 
There must be a delay between setting the flag register and testing the sync signal in order for the 


hardware to have time to switch the signals. A minimum of two instructions must be executed 
between setting the flag register and checking the sync signal. 


Pixel Array Board Mode Register 
Each pixel array board contains a mode register that contains: 
m the board level overlay mode flag 
m the video shift flag 
® gate flags that are used to disable pixel boards 
@ serial I/O direction flags 


The mode register is a 16~bit register that can only be accessed by the host. The structure of the 
register is: 


ss gofgs vvoo 


where: 
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ss is the serial I/O direction to be used. 


gf is the DEV_GATES FIFO flag. If true the FIFO flags are included when determining the 
FIFO flags that are passed to the pipe board(s) to determine whether they can broadcast to the 
pixel nodes. 


gs is the DEV_GATES_ SYNC flag. If true, the psync and vsync flags from the processors 
on this board are included when determining the value of the PM_ALLRDY and 
PM_XFLAG signals. 


vv is the video shift mode, and must be one of the following; 

0 00 (DEV_SHIFT_NOT964): used for all models except the 964 

o 01 (DEV_SHIFT_TOP964): upper 4 lines of each 8 scan lines of the 964 

o 11 (DEV_SHIFT_BOT964): bottom 4 lines of each 8 scan lines of the 964 
oo is the overlay mode. This indicates whether or not overlaying is enabled, and if enabled 
which of several modes is to be used. The value must be one of the following: 

a 00 (DEV_OVERLAY_OFF): Overlaying is disabled. 


o 01 (DEV_OVERLAY_ON): Overlaying is enabled. If the overlay value is zero, the 
RGB value is used. If the overlay value is 255, the inverse of RGB is used. If the 
overlay value is 1-254, the overlay value is used. 


o 10 (DEV_OVERLAY_FORCE): The overlay value is always used. 
a 11 (DEV_OVERLAY_MASK): If the high order bit (80 hex) of the overlay value is 
true, the overlay value is used, otherwise RGB is used. 
o The value displayed for each pixel is determined by: 
o the value of the pixel memory (RGB and overlay) 
a the overlay mode in the pixel mode register of each pixel array processor board 


a the overlay flag in each of the pixel node flag registers 


If all of the overlay flags are on, overlay mode is determined by the overlay mode in the pixel mode 
register. If all of the overlay flags are off, then DEV_OVERLAY_OFF mode is used. 


The three components described above are used to make two decisions: 


1. 
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which values should be sent to the video controller for RGB. The video controller accepts 
24-bits of color information, 8—bits each of red, green, and blue. The 24—bits can contain 
either the red, green, and blue pixel data, or the overlay data can be copied into the red, 
green, and blue fields (the same 8—bits is copied into each of the three colors). 
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2. which lookup tables on the video controller should be used to produce the final color value. 
Normally the primary lookup table is used. When the overlay data is being displayed, the 
overlay table is used for those pixels. 


Table 1-3: Memory Map of a Pixel Node’s Address Space 





















RGB 







Overlay 
Value 


Lookup 
Table 


Overlay 
Flag 


Overlay 
Mode 















OFF (any) (any) RGB Primary 
ON OFF (any) RGB Primary 


ON ON 0 RGB Primary 
1-254 000 Overlay 
255 “RGB Primary 


ON FORCE (any) RGB Overlay 


ON MASK 0-127 RGB 
128-255 Overlay 








When the overlay flag in the pixel nodes is off (false), the RGB data is displayed using the primary 
lookup table in all cases. When the overlay flag in the pixel nodes is true, the displayed value 
depends on the overlay mode and the contents of each overlay pixel. 


DEV_OVERLAY_ON: If the overlay value is zero, the red, green, and blue data is displayed using 
the primary lookup table. If the overlay value is in the range 1-254, the overlay value is used for 
red, green, and blue, and the overlay lookup table is used. If the overlay value is 255, the bitwise 
complement of red, green, and blue is displayed using the primary lookup table. 


DEV_OVERLAY_FORCE: The red, green, and blue data is displayed using the overlay lookup 
table. 


DEV_OVERLAY_MASK: If the overlay value is in the range 0-127, the red, green, and blue data 
is displayed using the primary lookup table. If the overlay value is in the range 128-255, the over- 
lay value is used for red, green, and blue, and the overlay lookup table is used. 
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Input FIFO 


The input FIFO contains up to 2048 bytes of data, organized as 512 units of four bytes each. The 
input FIFO may be read as one four-byte word, two 2-byte words or as four bytes, however, all four 
bytes of each FIFO entry must always be read in order for the contents of each byte of the FIFO to 
remain synchronized with the others. The status of the input FIFO is checked by setting the 
PM_DRAW_EMPTY or PM_DRAW_HALF bit in the mode register then checking the sync flags 
(sys or syc). The FIFO must not be read when it is empty. 


Z Memory 


The Z memory (also referred to as DRAM or ZRAM) consists of 256k bytes of dynamic RAM 
memory for each pixel node. The Z memory can be used for floating point values, integers, or 
bytes, and it is accessed through the use of page registers, which are described below. 


Video Memory 


Each node contains 512k bytes of video memory. The video memory is also accessed through the 
use of page registers, and it is divided into two sections, VRAMO and VRAMI. Each of these 
areas is subdivided into two sections: one containing the red and green pixel components (RGO and 
RG1) and the other containing the blue and overlay pixel components (BOO and BO1). 


As described above, each color component of a pixel consists of 8 bits of data stored in the high 
order bits (the bits after the sign bit) of a short integer. In the red/green section the red pixel data is 
stored in the low order word of the 32 bit value for a pixel, and the green data is in the high order 
word. Since the byte ordering on the DSP32 is least significant byte first, the red pixel data is 
stored at byte location N, and the green information is at byte location N+2. In the blue/overlay 
region the blue data is in the low order word (memory address N) and the overlay data is in the 
high order word (memory location N+2). 


Page Registers 


Page registers allow the pixel nodes to access 256k bytes of ZRAM and 512k bytes of video RAM 
even though the DSP32 only has a 16 bit address space. 


Memory addresses in the range 8000 through bfff are reserved for paged memory access. Put 
another way, all addresses that begin with the bit sequence 10 are reserved for paged memory 
access. Paged memory addresses have the form: 


10pp ppoo oooo oooo 
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where: 
10 designates this as a paged address 
pppp is the page register number (0 to 15) 
oocov00000 is the ten bit offset from the address contained in the page register. 
The page registers are accessed as four byte values at memory locations starting at c800 for page 


register 0, c804 for page register 1, and so on, up to c83c for page register 15. Only the 12 low 
order bits of the page register are used. The structure of the page register is: 


mbbb aaaa aaaa 


where m is the mode selection bit. The two addressing modes are: 


u 0: Fixed row addressing. The address field contains the row number to be accessed. This 
mode allows the access of a processor’s pixels on a given scan line. 


m 1: Fixed column addressing. The address field contains the column number to be accessed. 
This mode allows the access of a processor’s pixels on a given screen column. 


bbb is the bank selection field. The defined values are: 
a 001: PM_ZMEM - Z memory 
= 100: PM_RGO — the red/green components stored in VRAM bank 0 
mw 101: PM_BOO — the blue/overlay components stored in VRAM bank 0 
m 110: PM_RG1 — the red/green components stored in VRAM bank 1 
w 111: PM_BO1 ~ the blue/overlay components stored in VRAM bank 1 
aaaaaaaa is the 8—bit extended address value. 


Macros are provided to build and use page registers. They hide the internal structure of the page 
register and the physical addresses that are used. 


PMdesc is used to build a value to be stored in a page register. The format of the macro is: 
PMdesc(mode bank)+extended_address 


mode must have the value PM_FIX_ROW or PM_FIX_COL 
bank must have the value PM_ZMEM, PM_RGO, PM_BOO, PM_RG1, or PM_BO1 
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PMpagereg is used to access the locations in which the page registers are stored. The format of 
the macro is: 


PMpagereg(reg_number) 


reg number must have a value in the range of 0 to 15 


PMxlate is used to generate an address that uses a page register. The format of the macro is: 


PMxlate(reg_number) 


reg number must have a value in the range of 0 to 15 


The following is an example of the macros used in assembly code. 12 holds the row index and 13 
contains the column index: 













rl = PMdesc(PM_FIX_ROW, PM_ZMEM) + r2 /* Access the 2 Tmamory row 
designated by the value in r2 */ 


*PMpagereg (4) = x1 /* Move the descriptor into 
page register 4 */ 
r3=r3* 2 /* convert the column index in r3 to a byte a} 
r3= r3 * 2 /* offset by multiplying by 4 (since each 
float takes up 4 bytes) ef 
r4 = PMxlate(4) + r3 /* Get the address of the value 


designated by the row 

number in page register 4, 

plus the offset in register r3 */ 
/* Move the desired value into register a0 */ 
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Internode Communications 


In some applications, it may be necessary for the pixel nodes to exchange data. For example, to 
scroll the image memory by one pixel requires that each processor send the entire contents of its 
pixel memory to a neighboring processor, replacing the pixels with those received from another 
neighboring processor. Another use is communicating intermediate results of a computation that is 
distributed over the processor array, for example, multiplying matrices that are distributed among 
the nodes. 


To accomplish this exchange of data, the Pixel Machine supports nearest-neighbor communications 
among the pixel nodes. Because this communication is implemented using the serial 1/O port of the 
DSP chip, it is sometimes referred to as SIO (Serial I/O) to distinguish it from the node-host com- 
munication implemented with the parallel I/O port. 


Topology 
Each node can communicate with one of four neighboring nodes over the communications links; the 


neighbors of any node, and the links connecting to those neighbors, are referred to as North, South, 
East, and West. 
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Figure 1-6: Link directlons from a pixel node 
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The communications links between the mesh of pixel nodes form a torus: a mesh connected at the 
edges (Figure 1-7). This allows every node to connect to four neighbors. The neighboring nodes are 
arranged in the same pattern as pixels are interleaved among pixel nodes; complete topologies for 
all Pixel Machine models are given at the end of this section. 
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Figure 1-7: Torus topology for a 4x4 processor mesh 








While each node has links to four neighboring nodes, only one of the four links may be active at a 
given time. Furthermore, all nodes communicate in the same direction. Setting the active link 
(also referred to as setting the link direction) is done by the host using the DEVserial_direction 
call. This restriction means that all pixel nodes must agree on the order in which data is sent over 
the links, and to which destination nodes it is sent. The link direction is set to North, South, East, or 
West, and sends data to the neighboring node in that direction, while receiving data from the neigh- 
boring node in the opposite direction. For example, if the link is set East, a node will send data to 
its East neighbor while receiving data from its West neighbor. 
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Video Display 


The frame buffer is distributed throughout the array of pixel nodes. An example is shown in Figure 
1-8. The pixel funnel rearranges pixels from the frame buffer into a properly ordered raster scan 
sequence. Both the video controller and the pixel funnel are software configurable for the five dif- 
ferent pixel arrays. 


Figure 1-8: Pixel mapping In the distributed frame buffer for a PXM 916 


Pixels on the Display 





The frame buffer stores (red, green, blue, «) values for each pixel; the video processor may substi- 
tute the o value for the red, green, or blue value, based on the display mode: 
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RGBows = rg Din 
OVERLAY OFF 


tgbin if o=0 
RGBout =} rgbin if a=255 
aam if 0<a<255 


OVERLAY _ON 


rgbin if OS0<128 
RGBow = {rt if 128<a<255 


OVERLAY FORCE 
RGBowt = rg bin 


OVERLAY MASK 


The video processor uses six 256x10 lookup tables, or color maps, to translate 8—bit pixel color 
values to 10-bit video data. Three of the tables map red, green, and blue. The other three map o 
to red, green, and blue values. 


There are two sets of color maps. One set contains high-speed video tables that are used to convert 
video data. The other set are shadow tables that can be read and written via the VMEbus. The con- 
tents of the shadow tables are automatically copied to the video tables during a vertical retrace 
period, with copying enabled and disabled in software. The shadow tables prevent two problems 
common to many video systems from arising: snowy and sheared video because color maps are 
modified during active video periods, and distracting flashes on the screen because of partially 
modified color maps. 


In high-resolution mode, the video system displays 1024 lines of either 1024 (in systems with 16, 
32 or 64 pixel nodes) or 1280 (in systems with 20, 40, or 64 nodes) pixels, at 60 Hz non-interlaced. 
In NTSC mode, the video system uses the RS-170A format to display 485 of 720 pixels in all pixel 
node configurations. In PAL mode, 575 lines of 720 pixels are displayed. 
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DEVtools supports the PAL and NTSC video format, PAL is the video format used in Europe 
which corresponds to the NTSC format used in North America. PAL enables the Pixel Machine to 
produce a PAL signal for European customers to use with their video equipment. The screen reso- 
lution for PAL is 720x576; NTSC resolution is 720x485. 

Using the Pixel Machine in PAL mode is similar to using it normally. All the user is required to do 
is set the appropriate model, which should be 964p for a single pipe 964 and 964pd for a dual pipe 
system, and issue a hypinit command to actually switch to PAL video format. To switch back to 
standard hi-res video, change the model to 964 or 964X and issue a hypinit command. 


For NTSC mode, set the model as you would normally but append an n to the end of it. For exam- 
ple, if you have a 964 with a dual pipe, set the model to a 964dn. 


For high resolution the subscreens are: 


[Model | Subsoreens [Size 
r9eex [1 | To0xi78 
968 [1 | tena | 
ris [ 4 | 128 









For PAL the subscreens are: 
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For NTSC the subscreens are: 












[Model | Subscreens | Size 
Tox122 
932n | 1 | 90x122 
[900 [1 _—* «tae 22 
180x122 








The DEVtools variables PMimax and PMjmax are set to these limits minus one. Therefore, on a 
model 940n (NTSC) PMimax would be equal to 71 and PMjmax would be 121. 
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As described above, the Pixel Machine can be configured with five different pixel array sizes and 
three different pipeline sizes. The ten models are described in Table 1-4, Models 964 and 964d 
can be programmed to display either 1024x1024 or 1280x1024 pixels. 


In high-resolution display mode, Models 916 and 920 have two rgb o. frame buffers and one z- 
buffer, with enough memory to render full-screen 32-bit images in double-buffered mode with a 
floating point depth buffer. Models 932 and 940 have two off-screen buffers in addition to the two 
displayable buffers, plus two z-buffers. In addition to the two displayable buffers and one z-buffer, 
the model 964 has an additional six video buffers and three z-buffers in 1024x1024 mode. In 
1024x1280 mode there are two additional video buffers and one additional z~ buffer. 


In NTSC display mode, each pixel node has a single subscreen, regardless of configuration. The 
subscreen size in Model 916 is 180x122 = 21960 pixels, while in the 964, it is one fourth as big. 
This means that a 916 can render full-screen NTSC images about as fast as a 964 can in high- 
resolution mode. 
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Table 1-4: Pixel Machine configurations 


Model Peak Performance 
Number MIPS MFLOPS 
80 160 











Nodes 
pipe _ pixel 





















Memory Buffers Bytes 
(Mbytes) | reba z | per pixel 
12 2 1 12 
12 2 1 12 


* 964 programmed to display 1280X1024 pixels. 
** 18 Mbytes in additional partial buffers are available. 


1-34 DEVtools User’s Gulde, Version 1.0 








Pixel Machine Software 


The distinct architectural components of the Pixel Machine are the host computer, the pipe nodes, 
and the pixel nodes. The host computer allows an application program to access the power and 
functionality of the Pixel Machine, the pipe nodes are responsible for the serial parts of algorithms, 
and the pixel nodes execute parallel algorithms. The following sections describe the software that 
supports each architectural component. 


Host Software 
A host-resident, C- callable library is responsible for command creation and transmission, invoca- 
tion of subprocesses that monitor external events, and machine initialization and control. 


Commands are the packets of data that the host sends to the Pixel Machine to request actions or to 
serve as data. Commands are discussed more completely in Chapter 2, ‘‘Writing Programs for the 
Host’’. Commands should not be confused with messages, which are requests that originate in the 
Pixel Machine and are directed to the host. Messages are explained in Chapter 3, ‘“The DEVtools 
Message Service Protocol’’. 


The primary functions provided by the host are: 

@ translating high-level function calls and macros into commands 

@ transmitting commands over the VMEbus to the Pixel Machine 

a down-loading code to the pipe and pixel nodes and initializing them 

mw handling interactive functions (e.g., mouse/cursor interface) 

™ processing message requests received via the parallel I/O from the Pixel Machine processors 
All commands are sent to the first node in the pipeline. Commands proceed serially down the pipe 


until the last node broadcasts them simultaneously to all of the pixel nodes. In systems without a 
pipeline, the host sends commands directly to the pixel node broadcast bus. 


Pipe Node Software 


The pipe nodes are typically used to implement a set of algorithms that act serially on a set of data. 
For example, a rendering and modeling application might use the pipeline to generate objects, apply 
modeling and viewing transformation, cull and shade the objects, apply projection transformations, 
do x, y, and z clipping, and finally, map the image to a viewport on the screen. 


A useful analogy is to think of the pipe nodes as UNIX® system filters. Each node, like a UNIX 
system filter, reads some input, transforms the input, and writes some output. 
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The pipe program reads commands from the input FIFO. Each command consists of an opcode, the 
number of parameters, and the parameter list. When a command arrives at a pipe node, four actions 
can be triggered. The node can: 


m forward the command to the next node in the pipeline, 
m modify the parameter list and send it down the pipe 
m™ process the command, possibly generating new commands, 


™ consume the command. 


Each pipe node stores and executes routines that are invoked by command opcodes. In a typical 
polygonal rendering and modeling application, the pipe nodes perform geometric processing algo- 
rithms. For instance, three nodes could be assigned to clip the polygons in the x, y, and z planes, 
and one node to shade the polygon. In order to optimize the shading, however, the system could 
easily be re-configured to have three shading nodes and only one clipping node. Thus the pipeline 
can be optimized for any application through experimentation and new functions can be added as 
needed. 


A Pixel Machine can contain no pipe, a single 9-node pipe, or 18 pipe nodes that be configured as 
either two parallel 9—-node pipes or a single 18—node pipeline. In parallel mode, shown in a single 
pipeline are duplicated in both pipes. 


Pixel Node Software 


Pixel nodes implement algorithms that can be done in parallel, like the raster-scan conversion of 
points, lines, and polygons, image compositing, and ray tracing. Because the frame buffer memory 
is distributed through the pixel node array [Figure 1-6], all routines that access the frame buffer are 
implemented here as well. In the pixel node array, identical functions are usually replicated in each 
node. 


The Distributed Frame Buffer 
Programming the pixel nodes to access the interleaved frame buffer requires an understanding of 
two concepts: 


1, an algebraic domain transformation that maps from a screen space coordinate system to a 
processor space coordinate system, and 


2. techniques for rendering images in a subscreen, the small, contiguous frame buffer that is 
attached to each pixel node. 
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The domain transformation maps point from Cartesian (x,y) screen space to (i,j) processor space 
as follows: 


i = b-(@ - Or) i = ph - Oy) 


where N, and Ny, are the number of processors per row and column, respectively, in the pixel node 
array, and are fixed for any given model of the Pixel Machine. O, and Oy select a particular pro- 
cessor in the array, with O, in the range [0,N,—1] and Oy varying between 0 and N,-1. 


The transformations from processor to screen space are: 


The effort required to parallelize an existing algorithm involves the restructuring of the algorithm so 
that it operates in the (i,j) processor space rather than screen space. Any algorithm that processes 
each pixel independently, such as fractal generation or ray tracing, requires very little modification, 
because no coherence is required from one pixel to the next. The number and complexity of 
modifications required increases with the degree of coherency between one pixel and the next or one 
scan line and the next. Writing a program so that it adheres to the domain transformation guaran- 
tees portability to single processor systems, where N, = Ny = 1 and Ox = Oy = 0. 


The pixel interleaving scheme presents an obstacle to applications that require a single pixel node to 
process and display a contiguous set of pixels. The serial I/O (SIO) capability of the pixel nodes 
provides a way to circumvent the problems created by interleaving. The set of pixels can be created 
in undisplayed memory and then routed, using SIO, to the pixel nodes that will display them. 


The pixel nodes are arranged in an n xm array, and the processor in the ith row, jth column handles 
every nth pixel on every mth scan line (see Figure 1-6). Each processor addresses a portion of the 
frame buffer, which it sees as a contiguous subscreen. The coordinate system of the subscreen is 
called the processor space. DEVtools provides mapping functions from (x,y) screen space to (i,j) 
processor space: 

i = PMilo(scrn, x) returns the smallest integer i = a 


i = PMihi(scrn, x) returns the largest integer i < as 


j = PMjlo(scrn, y) returns the smallest integer j > yee 
= PMjhi(scrn, the largest int < 
j jhi(scrn, y) returns the largest integer j tat 


Where O, and Oy are the processor offsets in the x and y direction, respectively, and N, and Ny 
are the numbers of processors in the x and y direction, respectively. 
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Because there are more pixels in screen space than in processor space, the mapping is not 
one-to-one. To ensure that the processor space pixel (i1,j1) is actually screen space pixel (x 1,1), 
the following condition must hold: 


(PMilo(scrn,x1)==PMihi(scrn,xl)) && (PMjlo(scrn,yl) ==PMjhi(scrn-yl1) ) 


Here is a simple example. The code segment shown in Example 1-1 draws a set of vertical and 
horizontal lines in a screen space viewport defined by xmin , xmax , ymin , and ymax . 


(xexmin; x<xmax; x+edelta) 
(yeymin, y<ymax; y++) 
PMputpix(scrn, x, y, RED); 


(yeymin, y<ymax; y+=delta) 
(x=xmin; x<xmax; x++) 
PMputpix(sern, x, y, GREEN); 





Example 1-1. Line drawing in screen space. 


This code segment can be converted into code that will run on a 964 by adding a conditional state- 
ment to test for the pixel’s presence in the processor space of this node, as shown below (Example 
1-2). 


for (xsxmin; x<xmax; x+delta) 

for (y~ymin, y<ymax; y++) 

if ((isPMilo(scrn, x))==PMihi(scrn, x) &&(j=PMjlo(scrn, y))==PMjhi(scrn, y)) 
PMputpix(scrn{0],i, Jj, RED); 


for (y=ymin, y<ymax; y+edelta) 

for (x=xmin; x<xmax; x++) 

if ((i=PMilo(sern, x))==PMihi(scrn, x)&&(j=PMjlo(scrn, y))==PMjhi(scrn, y)) 
PMputpix(sern[0},4i, 3, GREEN); 





Example 1-2, Line drawing in processor space. 
The pixel node code shown above is straightforward but inefficient. It iterates across screen space, 


and does the processor space mapping and testing for each pixel. A better method is to iterate over 
processor space, as shown in Example 1-3. 
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imin = PMilo(scrn, xmin); 
imax = PMihi(scrn, xmax); 
jmin « PMjlo(scrn, ymin); 
jmax = PMjhi(scrn, ymax); 


for {i=imin; i<#imax; i+edelta) 


for (jeynin, j<ejmax; j++) 
PMputpix(PMserns[0],4, j, RED); 


(j=jmin, j<=jmax; jt+delta) 
(i=xmin; i<=imax; i++} 
PMputpix(PMscrns[0],i, 4, GREEN); 





Example 1-3. Efficient line drawing in processor space. 


In the next section, a more complicated example of an algorithm that might be implemented in the 
Pixel Machine nodes is presented. 


Visualizing Complex Functions 


Fractal geometry is a branch of mathematics used to describe self-similar structures such as those 
observed in nature. The Julia set is a class of fractals in the complex plane. A generating function 
is evaluated at discrete points in a complex range until the function diverges or a maximum number 
of iterations is reached. 


The generating function is a squaring function of the form: 


Zn+1 = Z#t+C = (Xn +Y, i?-P +Q i) 
= XP- VY PH2Xn YnttP+Qi 
= X,t1+Y,+1i 
where Xna = XP — YPt+P Yast = 2Xn Yn+Q 


Different values of P and Q define different Julia sets. If z = X,f-+Y? is greater than a pre- 
specified limit, the function has diverged. 
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The Standard Implementation 


The Julia set is displayed by mapping a rectangular region of the complex plane onto a raster 
display. The complex region is described by the real and imaginary ranges, relo to rehi, and imlo to 
imhi. The real axis is plotted in the x direction and the imaginary axis in the y direction. 


The generating function is evaluated at discrete complex coordinates corresponding to each pixel in 
the viewport. The complex coordinates are defined by a linear mapping from screen space to com- 
plex space: 


re =ay"x+b1 im = a2*ytb, 


The algorithm loops over all pixels in the range [xmin max], [ymin ymax]. The generating func- 
tion is iterated at each pixel, with initial values Xo = re and Yo = im. The iteration continues until 
the square of the magnitude of the generating function, z, diverges from the origin (z 2zmax) or a 
pre-set number of iterations is reached (n =nmax ). 


If the function does not diverge within a limited number of iterations, the pixel color is based on the 
final z value. Otherwise, the intensity is based on the number of iterations performed. 


Pixel Machine Implementation 
The transformation from (i,j) processor space to (x,y) screen space is given by the equations: 
x= i*N,+Ox y = j*Ny+Oy 


where N, and Ny are the number of processors in x and y, respectively, and Ox and Oy are the 
offsets into the two dimensional processor array. Each processor loops over pixels in the range imin 
to imax, jmin to jmax. The library functions PMilo0, PMihi(), PMjlo(), and PMjhi() are used 
to map the given (x,y) limits into boundaries in (i,j) space for each processor. 


Equations 1 and 2 provide the mapping from a given (x,y) screen coordinate to its corresponding — 
(re, im) complex coordinates. Library functions PMfxtoi() and PMfytoj() transform a screen space 
equation into a processor space equation by modifying its coefficients. Equations 1 and 2 become: 


re =aj"itb} im = a2*j+b2 


Once these initial transformations have been performed, the algorithm proceeds exactly as the 
sequential one does, except that each processor loops from imin to imax and jmin to jmax, as 
opposed to xmin to xmax and ymin to ymax. When a pixel value has been determined, its color 
is set using the function PMputpix(). 
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Example 1-4 shows the two implementations. Example 1-4(a) is the sequential algorithm, operating 
in screen space. Example 1-4(b) is the parallel algorithm, operating in processor space. The lines 
that differ are shown in boldface, illustrating the minimal changes which are required to adapt 
existing algorithms to the Pixel Machine. 


al (rehi - relo) / (xmax - xmin); 
bl = relo - al*xmin; 
a2 = (imhi - imlo) / {ymax - ymin); 
b2 = imlo - a2*ymin; 


for (y = ymin; y<=ymax; y++) 
for (x=xmin; x<=xmax; x++) [ 

re = al*x + bi; 

im « a2*y + b2; 


done = FALSE; 
for n=O ; n<nmax && !done ; ntt) { 
if ((z = re*re + im*tim) <= zmax) { 
temp_im = 2*re*im + Q? 
re = retre — im*im + P; 
im = temp_im; 
} 
else done = TRUE; 
) 


if (done) write pixel(x, y, value_based_on_n); 
else write pixel(x, y, value_based_on_z); 





(a) The standard implementation. 
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al (vehi - relo) / (xmax - xmin); 
bl = relo - al*xmin; 
a2 = (imhi - imlo) / (ymax - ymin); 
b2 = imlo - a2*ymin; 


imin = PMilo( scen, xmin ); 
jmin = PMjlo( scrn, ymin ); 
imax » PMihi( scm, xmax ); 
jmax = PMjhi( scm, ymax ); 


PMfxtoi(scrn,al, b1); 
PMfytoj(sem,a2, b2); 





for (j=jmin; j<=jmax; 3++) 
for (isimin; i<«imax; i++) { 
re = al*i + bl; 
im = a2*4j + b2; 


done = PALSE; 
for (n = 0; nemaxn && !done; nt) { 
if ((z = re*re + im*im) <= zmax) { 
temp_im = 2*re*im + Q7 
re = ratre — imtim + P; 
im = temp_im; 
} 
else done = TRUE; 
} 


if (done) PMputpix(scm, i, j, value_based_on_n); 
else PMputpix(scrn[0]i, j, value_based_on_z); 





) 


(b) the Pixel Machine implementation. 


Example 1-4. Fractal functions: Sequential and parallel implementations of the Julia set. 
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Header Files and Subroutine Libraries 


Introduction 

An application program for the Pixel Machine can be written to run in the host, the pipe nodes, the 
pixel nodes, or a combination of the three. The header files and libraries that provide useful 
definitions and functions for all three programming environments are discussed in this chapter. 


Following is a list of header files that are used to compile programs for the host and for the Pixel 
Machine: ; 


Table 2-1: Host and Pixel Machine Header Files 


[Host ____| Pixel Machine 


devtools.h pxm.h 
devcommand.h | libmath.h 
devimage-h syscmd.h 


deverror.h pageregs.h 
msgserve.h model.h 
sysmsg.h pipe.h 
crt0.h pixel.h 
pipe.h sysmsg-h 
pixel.h 





The first section describes writing host programs that can control and communicate with the Pixel 
Machine. The second section describes the library that contains Pixel Machine functions for the 
pipe and pixel nodes for the DSP32. The final section describes the libraries that contain DSP32 
routines. These are general purpose routines, not limited to use on a Pixel Machine. 


Documentation 
All the functions for the host and Pixel Machine are described in the DEVtools Reference Manual 


and are included in the on-line manual pages. The DSP32 libraries are described in the WE® 
DSP32 and DSP32C C Language Compiler: Library Reference Manual. 
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Introduction 


To compile and link programs that use DEVtools you need to know where the header files, execut- 
able files, and libraries reside on your system. devtools, the DEVtools directory, can usually be 
found in the directory hyper. This directory usually resides in /usr, but it can be located elsewhere 
on your system; you may need to check with your system administrator. The examples in this sec- 
tion assume that the DEVtools directory is called /usr/hyper/devtools. 


Setting up your Environment 


The PXMtools software provides files in the /usr/hyper directory that can be used to initialize the 
execution search path and environment variables that you need to use the DEVtools software. The 
files are: 


chyper_profile | Profile file for Boume shell and Kor shell 


-hyper_env Environment definition file for Kom shell 
-hyper_login Login initialization file for C shell 
-hyper_cshre C shell startup file 





To use the DEVtools software, your executable program search path list must include the required 
hyper directories, and must also contain: 


[Name Fumetion, 


/usr/hyper/devtools/bin Contains DEVtools executables such as devprint 
/usr/hyper/devtools/dsp32/bin 













Contains the executables for the DSP32 Support Software Library 





The environment variable DSP32SL must contain the pathname of the directory that contains the 
DSP32 Support Software Library, usually /usr/hyper/devtools/dsp32. 


To access the DEVtools online manual pages, your MANPATH environment variable must include 
/usr/hyper/devtools/man. 
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Compiling DEVtools Programs for the Pixel Machine 


Pixel Machine programs are compiled using the devcc command. devec is similar to the d3cc 
command (described in the DSP32 C Language Compiler User's Manual), but it knows about the 
special files needed for compiling auxd linking Pixel Machine programs. These special files are the 
DEVtools library, include files, startup code, and the loader directive file. A typical command line 
used to compile a Pixel Machine program is: 


devcc —c ctest.c 
Additional information about devcc can be found on the manual page in the DEVtools Reference 


Manual, and more information on the DSP32 Support Software Library can be found in the DSP32 
Support Software manuals: 


DSP32 Software Support Library User's Manual 
DSP32 C Language Compiler User’s Manual 
DSP32 C Language Compiler Library Reference Manual 


Linking DEVtools Programs for the Pixel Machine 


Pixel Machine programs may be linked using devcc or d3ld. After the Pixel Machine library has 
been specified, any of the DSP32 Support Software Library libraries can be specified. libe, libm, 
and libap may be specified using the -1c, —1m and —lap compiler or linker options. 


devcc supplies the linker with the appropriate options to successfully link programs that run on the 
Pixel Machine. Following is a typical command line used to link a Pixel Machine program: 


devcc —o ctest.dsp ctest.o 


In the few cases where the loader must be called explicitly, the link command must also provide the 
following information: 


w /usr/hyper/devtools/lib/crt0_pixel.o or crt0_pipe.o: the startup (crt0) file 


a /usr/hyper/devtools/include/pixel_ifile or /usr/hyper/devtools/include/pipe_ifile: the 
memory usage definition file 


a /usr/hyper/devtools/lib/ibpm.a 
Following is an example of how to use d3ld to link a Pixel machine program: 
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431d /usr/hyper/devtools/include/pixel_ifile\ 
/usr/nyper/devtools/lib/crt0_pixel.o\ 
ctest.o\ 


/usr/hyper/devtools/lib/libpm.a\ 
=1c\ 
-o ctest.dsp 





The above command links a program that runs in a Pixel node. For Pipe node programs the startup 
file would be: 


/usr/hyper/devtools/lib/crt0_pipe.o 
and the loader directive file would be: 
/ust/hyper/devtools/include/pipe_ifile 
When you use devec, pipe or pixel programs can be specified with the — pipe or — pixel command 


line options (— pixel is the default). With these options devcc specifies the appropriate files to 
d3ld. See the manual page for devec in the DEVtools Reference Manual for more information. 


Stack Configuration 


The default action of the loader is to load the stack segment immediately after the text segment. 
Because the DSP C compiler grows the stack from low memory to high memory, the stack is 
allowed to grow to fill all available memory in bank0. 


The stack section is set up differently in the pipe and pixel nodes as explained below. 


Pipe Node Stack 


In a pipe node, the stack is set to be a minimum of 320 bytes long. The loader exits with an error 
if there is not enough room for the minimum stack. It is possible to change the default minimum to 
something less than 320 bytes, but it is the user’s responsibility to make sure that the new minimum 
is sufficient. , 


To change the minimum stack size it is necessary to assemble and link a new stack.o with your 
program to replace the one that is automatically loaded from libpm.a. A sample stack.s that 
should be used can be found in /usr/hyper/devtools/lib/stack.s. To change the size of the stack, 
copy this file to another directory and assemble defining PM_STACK_SZ to the desired stack size, 
and PIPE. For example, to change the stack size to 2048 bytes use: 
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d3as -DPM_STACK_SZ=1024 —DPIPE stack.s 


The stack.o that is produced should then be linked with the user program. Be sure to include it 
before libpm.a. The stack size should be a multiple of 4. 


Pixel Node Stack 


In the pixel nodes the stack is set up differently. Because there is a relatively large initialization 
function (_initpixel()) that is called once and only once before main(), it is loaded above the stack 
in a section called .init, so that the stack can grow into it and reuse the space. The stack itself is 
initially given only 30 bytes, but the initialization code affords approximately another 1.5 kilobytes. 
Again, the loader will produce an error if this minimum stack does not fit. 


The user is still able to enlarge the stack, but should not reduce it any further. The same technique 
that is used for the pipe nodes is used here except that PIXEL needs to be defined. For example: 


d3as —-DPM_STACK_SZ=1024 -DPIXEL stack.s 


Compiling DEVtools Programs for the Host System 


Host programs are compiled using the cc command. To locate the header files, the directories that 
contain the Pixel Machine header files and the DEVtools header files must be specified on the cc 
command line. The cc command line should include the following options: 


~I/usr/hyper/devtools/include 


A typical command line used to compile a Pixel Machine program is: 


cc -c ~I/usr/hyper/devtools/include host.c 


Linking DEVtools Programs for the Host System 


Host programs should be linked using the cc command. The name of the Pixel Machine DEVtools 
host library (devlib.a) must be included on the ce command line. 
A typical command line used to link a host program is: 


cc -o host host.o /usr/hyper/devtools/lib/devlib.a 
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Versions of devlib.a are provided that support the Sun floating point accelerator (fpa), or support 
profiling, or support both fpa and profiling. Fpa support is only provided for the Sun 3 libraries. 
The names of the libraries are: 


devlib.a Any host, without profiling 


devlib_ p.a Any host, with profiling 
devlib_ffpa.a Uses fpa, does not include profiling code (Sun 3 only) 
devlib_ffpa_p.a | Uses fpa, includes profiling code (Sun 3 only) 





Sample Programs 


The DEVtools package includes a set of sample programs that illustrate the use of DEVtools for a 
variety of applications. The sample programs are located in /usr/hyper/devtools/sample/misc. 
The sample directory contains the directories: 


bin host executable programs 
boot Pixel Machine executable programs and shell scripts to run the sample programs 


host host source files 

include host and Pixel Machine include files for sample programs 
pipe pipe node source files 

pixel pixel node source files 





The host, pipe, and pixel directories each contain source code and a makefile. These files provide a 
good place to look for efficient usage of many Pixel Machine functions. The makefile can be used 
to generate the executable versions of the sample programs, and is a good guide for constructing 
makefiles for your own programs. 


The following lists the shell scripts that can be invoked to run the sample programs, and describes 
the functions illustrated by the program. 


mw Circle: A simple program to draw a large circle on the screen. A simple example of the use 
of subscreen information. 


m Colors: A very simple program. Clears the screen red then to grey. Uses PMapply() to 
replicate a function for each subscreen. 


m Copies: Make multiple, overlapping copies of upper the left hand comer using the VRAM 
and ZRAM copy routines. 
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Dataflow: Generates commands in one of the pipe nodes. These commands are passed 
down the pipe into the pixel nodes. A host program (devprint) is used to output print com- 
mands from the pixel nodes. Shows the use of command processing functions and the use of 
printt. 


Fastpixels: Demonstrate an efficient way to fill contiguous pixels. 
Hello: Clears the screen and uses printf. Prints the node ID. 


Julia: Displays and animates julia set fractals. Uses subscreens and the PMilo()/PMihi0 
and PMjlo()/PMjhi0 macros, as well as double buffering. Shows the processing power of 
the Pixel Machine. 


Led: Tums off the vsync and psync LEDs on the pixel processor boards. Only interesting 
if the cover of the machine is removed. 


Lights: Flashes vsync and psync LEDs on the pixel processor boards. Only interesting if 
the cover of the machine is removed. 


Mand: Mandelbrot set; another fractal. 

Math: Uses a number of math library functions. 

NTSC: Displays colored bars on the screen. Can be used with any type of display. 
Pipes: Shows how to pass data from one pipe node to the next. 

Pong: A sample animation of a bouncing ball. 

Pxmclear: Clear the front and back buffers to black. 

Qcopies: Use fast ZRAM copy to replicate image in front buffer. 


Send: A host program and associated pixel node programs. Implements a user-message 
handling routine on the host to route messages from one node to any other node. 


Shift: Moves pixels around the screen using serial 1/0. 


gm Texture: Generates a random texture. 


Zstuff: Uses a host program to set the contents of the Z memory and a pixel node program 
to read the contents from Z memory. 


Ztest: Sample use ZRAM allocation routines. 
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Introduction 


The operation of the Pixel Machine is controlled by the host system to which it is attached. Run- 
ning a program on the Pixel Machine requires the use of a group of system control commands that 
perform functions such as resetting the Pixel Machine processors, loading programs into the 
memory of the processors, displaying status information, etc. More detailed information about the 
system control commands can be found in the PXMtools manual pages. 


How to Run a Program on the Pixel Machine 


The execution of Pixel Machine programs is typically controlled by a program executing on the host 
system. Some simple programs that require no interaction with the host can be run without a host 
program through the use of the hypload and hyprun commands. 
Host programs provide: 
™ a simple mechanism to ensure that the proper programs are loaded into the pipe and pixel 
nodes 
® a convenient method of controlling the Pixel Machine 


™ a message passing protocol that allows a user program running on the Pixel Machine to sig- 
nal the host program. This feature can be used to send data to the host, request data from the 
host, and to perform any other tasks that the nodes cannot perform by themselves. 


m the ability for the Pixel Machine to output information on the host using the DEVtools 
printf routine 

m other control functions that must be performed by the host such as selecting the serial I/O 
direction. 


The host program should begin by calling DEVinit. This opens the Pixel Machine and resets all of 
the processors in the Pixel Machine. Before exiting the host program, DEVexit should be called to 
release the Pixel Machine so that it can be accessed by other users. 


Programs that require no communication with the host can be loaded and started by the hypload 
and hyprun commands. 


Following is an example of the commands used to run a program that does not require any host 
communication on all of the pixel nodes: 
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Tce 


hyplock 
hypload -dall prog.dsp 


hyprun -dall 
hypfree 





hyplock should be used to lock the Pixel Machine before running programs that are not controlled 
by a host program to prevent other users from accessing the machine while the current program is 
running. After the program has finished, executing hypfree makes the machine available for other 
users. When the Pixel Machine is controlled by a host program, hyplock and hypfree are not 
needed 


Refer to the PXMtools manual pages for more information about these programs. 
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Host programs are written in C and compiled, linked, and loaded with the standard compiler, 
libraries, and header files. In addition, there are libraries and header files of special functions and 
definitions used to control and communicate with the Pixel Machine. 


The devlib Library 


devlib includes functions for sending commands to the pipe and pixel nodes and for writing a mes- 
Sage server to respond to messages from the nodes. host/devtools.h should be included in all pro- 
grams that use devlib. host/msgserve.h is required if the message polling functions of DEVtools 
are used. host/devcommand.h, supplies macros for sending properly formatted messages to the 
pixel and pipe nodes. 


Host programs tend to focus on sending and receiving commands from the Pixel Machine. This 
section will describe the format of a command and the routines for reading and writing them, and 
then discuss a message server program on the host and how it might be extended to handle user- 
defined messages. 


Commands 


Commands are made up of an opcode, a parameter count, and a parameter list, as shown in Figure 
2-1. 


Figure 2-1: Command format 


parameter[1] eee parameter[count] 
0 bw 0 is 0 3 0 vw 





Commands are generated on the host and written to the pipe node FIFOs using the DEVewrite 
macros. DEVcommand is used to encode an opcode and parameter count into a properly format- 
ted 32-bit value that can be passed to the DEVewrite macros. There are twelve DEVewrite mac- 
ros, each handling a different number of parameters. 
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DEVewrite0( DEVcommand(opcode, 0)); 

DEVewritel( DEVcommand (opcode, 1), type, arg1); 
DEVewrite2( DEVcommand (opcode, 2), type, arg1...arg2); 
DEVewrite3( DEVcommand(opcode, 3), type, arg1...arg3); 
DEVewrite4( DEVcommand (opcode, 4), type, arg1...arg4); 
DEVewrite5( DEVcommand(opcode, 5), type, arg1...arg5); 
DEVewrite6( DEVcommand (opcode, 6), type, arg!...arg6); 
DEVewrite7( DEVcommand(opcode, 7), type, argl...arg7); 
DEVewrite8( DEVcommand (opcode, 8), type, arg1...arg8); 
DEVewrite9( DEVcommand(opcode, 9), type, arg1...arg9); 
DEVewrite10( DEVcommand (opcode, 10), type, arg1...arg10); 


DEVewriten( DEVcommand(opcode, count), type, arg_array, count); 


Commands with ten or fewer arguments are assembled more efficiently by the count-specific 
DEVewrite macro. Arguments can be either integer floating point or another type that represents a 
properly aligned 32-bit value, but all arguments to a command must have the same type. The type 
parameter to the DEVcwrite macros is a type name, either int, float or other type name. 
DEVwrite macros are similar to DEVewrite macros but they do not include command arguments. 


There is another set of twelve macros for writing commands to the second pipeline whenever a sys- 
tem with eighteen pipe nodes is configured with two parallel pipelines. They are identical in form 
and function to the macros presented above, except that alt is appended to the macro name (e.g., 
DEVewrite0_alt ). 


Four macros for reading commands are also defined in devcommand.h: DEVcread() and 
DEVcread_alt() read a command and parameter count, and DEVread() and DEVread_alt() read 
the arguments. These are used for reading the feedback FIFOs. 


High Level Functions 


Table 2-2 shows the routines that make up the high level devlib routines that are most commonly 

used by host programs. DEVinitQ and DEVexit() are the recommended ways to start and finish 

host programs. The other routines are part of the message passing support, and many of them will 
be used in the host program that is described in the next few paragraphs. 
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Table 2-2: High Level Functions 




























DEVexit() 
DEVget_scan_line() 
DEVinit() 
DEVpipe_boot() 
DEVpixel_boot() - 
DEVpoll_nodes() 
DEVput_scan_line() 
DEVrun() 
DEVswap_pipe0 
DEVuser_msg_enable() 
DEVwait_exit() 


halt processors and close Pixel Machine device 
upload an image or a portion of an image to a Pixel Machine 
open and initialize Pixel Machine device 

load a DSP’ executable into the specified pipe nodes 

load a DSP executable into the specified pixel nodes 

polls DSP processors for messages 

download an image or a portion of an image to a Pixel Machine 
begins execution of the current Pixel Machine program 
reverses the rolls of the primary and secondary pipes 

define a message code and associated functions 

wait for pixel nodes to signal completion, then call DEVexit0 












The host program performs certain functions on behalf of the Pixel Machine. These functions 
include initializing the system, loading programs into the nodes, beginning execution, and servicing 
message requests from the Pixel Machine. Message requests are used to perform other actions such 
as input/output (I/O) operations, controlling serial I/O, etc. Following is a sample host program: 
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#include <stdio.h> 

#include <host/devtools.n> 
main (} 

{ 

DEVpixel_ system *pixel_ system; 


if ((pixel_system = DEVinit()) == NULL) { 
fprintf(stderr, “Open of Pixel Machine failed.™); 
exit (1); 

} 


/* Load all of the pipes with "“pipe.dsp”. */ 
DEVpipe_boot (pixel _system, "pipe.dsp", 0, 
DEViast_pipe(pixel_system), NULL, DEV_BOOT_CHECK_TIME) ; 


/* Load all of the pixels with “pixel.dsp™. */ 
DEVpixel_ boot (pixel_system, “pixel.dsp", 0, 
DEVlast_pixel (pixel_system), NULL, DEV_BOOT_CHECK_TIME) ; 


/* Begin execution */ 
DEVrun (pixel_system) ; 


/* Poll the nodes for message requests. DEVpoll_nodes returns when a 
"host exit" message is received from a node. */ 


DEVpoll_nodes (pixel_system, 0, DEVlast_pipe(pixel_system), 
0, DEVlast_pixel (pixel system), DEV_FOREVER, DEV_NONE) ; 


/* Close the Pixel Machine, */ 
DEVexit () 3 
} 





Example 2-1. Sample Host Program 


To customize the host program to receive application-specific messages, calls to 

DEVuser_msg _enable() can be inserted after DEVinit() and before the polling loop. Each user 
message has a unique opcode in the range (0,DEV_HIGHEST_USER_MESSAGE ), defined in 
host/msgserve.h, and specifies two functions. The first routine is called if the message is received 
from a pipe node, and the second one is used when a pixel node sends the message. See the 
manual page for DEVuser_msg_enable() for more details. 
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Low Level Functions 


Table 2-3 contains low level system control and I/O functions. 





Table 2-3: Control and /O Functlons 


Name 
DEVclose() 
DEVfifo_parallel() 
DEVfifo_read() 
DEVfifo_resetQ 
DEVfifo_serial() 
DEVfifo_write() 
DEVget_color_map() 
DEVeget_pixel() 
DEVload_color_tables() 
DEVload_linear_ramp() 
DEVlock() 

DEVopen() 
DEVopen_system() 
DEVpipe_enable_error_halt() 
DEVpipe_get0 
DEVpipe_get_msg() 
DEVpipe_get_pir() 
DEVpipe_halt() 
DEVpipe_id_check() 
DEVpipe_id_print0 
DEVpipe_id_write() 
DEVpipe_put0 
DEVpipe_read() 
DEVpipe_run() 
DEVpipe_write() 
DEVpixel_buffer() 
DEVpixel_enable_error_halt( 
DEVpixel_get() 
DEVpixel_get_msg() 
DEVpixel_get_pir() 
DEVpixel_halt() 
DEVpixel_id_check() 
DEVpixel_id_print() 
DEVpixel_id_ write() 
DEVpixel_mode _ init) 
DEVpixel_mode_overlay() 
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Function 
disconnect the host program from the Pixel Machine device 
configure 18 pipe nodes as two parallel pipes 
read 4 bytes from a pipe node FIFO 
reset all FIFOs on a board 
configure 18 pipe nodes as one long serial pipeline 
write 4 bytes to a pipe node FIFO 
fetch the contents of the color tables 
read a pixel from the frame buffer 
load the color tables using a gamma correction table 
load the color tables with a linear table (no gamma correction) 
manage Pixel Machine locks 
connect a user program to a Pixel Machine 
allow system to be opened without resetting config information 
set DSP to halt on hardware errors 
read data from a pipe node 
read data from a pipe node 
read the PIR register in a pipe node 
halt a pipe node 
verify a pipe node’s identification block 
print a pipe node’s identification block on stdout 
write a pipe node identification block into memory 
send a block of data to a pipe node 
read data from a pipe node 
initialize and start a pipe node 
DMA a buffer of data to a pipe node 
select one of the two frame buffers for display 
set DSP to halt on hardware errors 
read data from a pixel node 
read data from a pixel node 
read the PIR register in a pixel node 
halt a pixel node 
verify a pixel node’s identification block 
print a pixel node’s identification block on stdout 
write a pixel node identification block into memory 
initialize pixel mode register 
set overlay mode in the pixel mode register 


DEVtools User’s Guide, Version 1.0 


Writing Programs for the Host 





Table 2-3: Control and /O Functions (continued) 


DEVpixel_mode_serial() set serial I/O direction in the drawing mode register 
DEVpixel_overlay() set overlay mode in the pixel nodes 

DEVpixel_put() send a block of data to a pixel node 

DEVpixel_read() read data from a pixel node 

DEVpixel_run() initialize and start a pixel node 

DEVpixel_start() start a program running in a pixel node 

DEVpixel_write0 DMA a buffer of data to a pixel node 

DEVput_color_map() update contents of the color tables 

DEVput_pixel() write a pixel value into the frame buffer 

DEVread_z() read from the z-buffer memory of a pixel node 
DEVserial_direction() update serial I/O link direction 

DEVsservershadow_off() turn off shadow palate update to allow color tables to be updated 
DEVshadow_on() turn on shadow palate update after color tables have been updated 
DEVunit() return the value of HYPER_UNIT environment variable 
DEVwrite_z() write to the z-buffer memory of a pixel node 





System Status Tracking 


Pixel Machines are frequently used by a number of people, each of whom can be running a different 
application, and possibly even using different libraries. For example, a single system may be used 
for applications written using PIClib, RAYlib, and DEVtools. Furthermore, several DEVtools users 
may each require different DSP code to be loaded in the Pixel Machine. 


Even if all of the users of a Pixel Machine are running a single application, there still can be differ- 
ences in configurations based on pipe modes (parallel vs. serial), video format (hi-res vs. NTSC) 
and other configuration parameters. 


All Pixel Machine software maintains a file that reflects the current status of the Pixel Machine. 
This status information includes the: 


@ number of pipe nodes 

m number of pixel nodes 

m current pipe mode (serial, parallel) 

B current video format (hi-res, NTSC, PAL) 
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@ current gamma correction mode 
® current video options (sync source, etc.) 


m pathname and modification time of the executable file loaded into each pipe and pixel node 


When a program is invoked that uses the Pixel Machine, the current state of the machine is com- 
pared with the configuration parameters specified by the user’s environment (HYPER_MODEL, 
HYPER _PIPE, etc.) The DSP executables required for the user’s application (as specified by 
DEVpipe_boot and DEVpixel_boot, or as implicitly specified by library functions such as 
PICinit) are compared with those currently loaded. If the appropriate software is not already 
present in the machine, it will automatically be loaded. 


Users that are developing DSP software can request that the file modification times of the execut- 
able files be compared with those of the files currently loaded in the machine. This allows new ver- 
sions of files to be loaded automatically. 


When a file is loaded into a Pixel Machine node, a checksum value is computed based on the path- 
name of the file and the process ID of the process performing the load operation. Subsequently, 
when another program checks whether the correct files are loaded, it first compares the pathname of 
the desired file with the pathname of the loaded file (relative pathnames are converted to absolute 
pathnames by prepending the current directory name). If the file names match, the modification 
times are compared (if this option has been selected). Finally, the checksum value stored in the 
node’s memory is compared with the value in the status file. If the checksums match, the program 
is not reloaded. The checksum is a safeguard to ensure that the system can not be fooled by a cor- 
rupted status file or by turning off the Pixel Machine. 


The status file is read by DEVinit() and written by DEVexit(). The status information is main- 
tained in memory during the execution of the host program. During execution, the disk copy of the 
Status file is marked as invalid. As a result, executing a command that checks the status file 
(hypid, for example) will result ina node checksum does not match message. 
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libpm 


libpm is the library that supplies subroutines for use in both the pipe and pixel nodes. All programs 
that use libpm must include the header file pxm.h 


Both pipe and pixel nodes use the command data structure as defined in Figure 2-2 and pxm.h. 
Pipe nodes read commands from their input FIFO in two stages, first the opcode and argument 
count (using PMgetop()) and then the arguments themselves (via PMgetdata()). Pixel nodes read 
all three command components at once by calling PMgetemd)(). 


Figure 2-2: PMcommand( data structure 


#include "pxm.h" 


typedef struct { 
short opcode; 


short count; 


float *data_ptr; 
} PMcmdtype 


extern PMamdtype PMcommand; 





Pipe nodes write commands to their output FIFO by calling either PMputcmd(), which writes an 
entire command, or PMputop() followed by PMputdata() if the argument count is greater than 
zero. Pixel nodes cannot send commands. 


Functions for Pipe and Pixel Nodes 


This section describes routines that are useful in both pipe and pixel node programs. 


A global variable, PMsem, is used as a semaphore by the host and node to synchronize DMA 
accesses. PMsetsem() and PMwaitsem() are the synchronizing primitives that set and test the 
semaphore. 


Recall that each node has a PIR register, written by the node and read by the host. PMoutpir() is 
the function that writes a value into the register. If necessary, it waits until the previous value has 
been read by the host. The PIR register is also written by the PMusermsg() routine, which sends a 
user-defined message to the host. 
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Table 2-4: Pipe and Pixel Node Functions 


PMcolor_fioat() | converts internal color value to floating point number 
PMcolor_int() converts internal color value to an integer 
PMdelay() do nothing for a specified time 

PMenable() enable processing of selected system commands 
PMfioat_color() | converts floating point value to internal color value 


PMhost_exit() signal DEVpoll_nodes to retum to caller 
PMint_color() converts an integer to an internal color value 
PMoutpir() output a value to the PIR register 
PMsetsem() set the semaphore 

PMusermsg() send a user defined message to the host 
PMwaitsem() wait for semaphore to clear 

printf() formatted output conversion on host 





Pipe Node Functions 


This section describes routines.that can only be used in pipe nodes. Most of them concern reading 
commands from and writing commands to the FIFOs, 


Table 2-5: Pipe Functions 


waits until control of the broadcast bus is granted 
copy opcode, parameter count, and data from input to output FIFO of a pipe node 






















PMbus _wait() 
PMcopycmd() 








PMfb_off() direct output commands to the regular output FIFO 

PMfb_on() direct output commands to the feedback FIFO 

PMgetdata() get data from a pipe node FIFO 

PMgetop() get opcode and parameter count from input FIFO of a pipe node 

PMputcmd() write opcode, parameter count, and parameters to the output FIFO of a pipe node 
PMputdata() write parameters to the output FIFO of a pipe node 

PMputop() write opcode and parameter count to the output FIFO of a pipe node 


PMswap pipe() | release the broadcast bus and request it again 
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Pixel Node Functions 


This section describes functions that are only useful in pixel nodes. Most of them are used to read 
and write values in the various pixel nodes memories. 


pxm.h includes macro definitions that convert (x,y) screen space coordinates to (i,j) processor space 
coordinates. 


a PMilo(subscreen,x) returns the smallest processor space integer that, when mapped to screen 
space, will be 2 x. 


mu PMihi(subscreen,x) returns the largest processor space integer that, when mapped to screen 
space, will be < x. 


s PMjlo(subscreen,y) retums the smallest processor space integer that, when mapped to screen 
space, will be 2 y. 


w PMjhi(subscreen,y) returns the largest processor space integer that, when mapped to screen 
space, will be < y. 


Figure 2-3 shows some examples of these mappings. Figure 2-3(a) shows an 8x8 comer of the 
screen, with some pixels tured on. Figure 2-3(b) shows the pixel to processor mapping for a 4x4 
pixel node mesh. Each pixel location is tagged with the number of the processor which keeps that 
pixel in its subscreen memory. Figure 2-3(c) shows the (i,j) values for the pixels and Figure 2-3(d) 
the individual subscreens. Table 2-6 shows the values associated with calls on the four macros used 
to map (x,y) values into (i,j) values. Whenever PMilo(subscreen,x)==PMihi(subscreen,x) and 
PMjlo(subscreen,y)==PMjhi(subscreen,y), then the pixel (x,y) is part of the issuing processor’s 
subscreen. (Macros PMmyx and PMmyy can be used to abbreviate the equality tests.) The table 
shows pixel ownership by using boldface for the values that satisfy the condition. 
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Figure 2-3: Screen to processor space mapping functions 





























(a) screen space (b) pixel to processor mapping 
for a 4X4 array 


(c) processor space 
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Table 2-6: Converting screen space coordinates Into processor space 


screen mapping processor doing the mapping 





pixel macro |0 1 2 3 4 5 6 7 8 9 10 11 2 13 M4 15 
wow) |i ot 2 1 21 «2 1 0 

(3) IH(3)}0 0 0 0 0 0 0 0 60 0 0 © 0 0 0 0 

JLO)}0 © 0 0 0 0 0 0 0 0 0 0 0 0 0 

JHI(O) |} 0-1 -l el ODD -E O EeeEOl 

wo) |i 1 4 2 1 4 2 1 2 2 2 do todo ad 

(42) IHW4)} 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 

wto2) /1 1 0 01 21 0 0 1 1 0 0 1 1 0 0 

JH2)}0 0 O -1 0 0 O -- 0 0 60 -l 0 0 OO «1 

wo) }1 1 #2 +F 0 0 0 0 0 o 0 0 0 

(15) HI) }0 0 0 0 0 0 0 O fo -L sh el elo ell 41 

JLo) |} 2 1 1 1 2 1 «21 «2 1 oi as | 

JH(5)}1 1 0 0 1 21 0 0 21 1 0 1 1 0 0 

oe) |} 2 2 2 22 2 2 2 21 2 4 4 «3 2 2 1 

(66) He) } 1 81 2 1 1 2 1 t t 1 4 1 0 0 0 «0 

Jlo~)|}2 2 1 1 2 2 212 «1 2 2 4 «1 2 2 «21 «21 

JHie) [1 1 #1 01 4 1 0 2 2 2 O 1 1 ft 0 





It is also possible to reverse the mapping. That is, the PMxat and PMyat macros can be used in a 
pixel node program to convert (i,j) into (x,y) coordinates. 


There are three macros that map the coefficients in linear expressions from screen space to processor 
space. 


s PMfxtoi(subscreen,A,B) converts an expression of the form Ax+B to one of the form 
A‘i+B’, The macro modifies the values of A and B. 


a PMfytofj(subscreen,A,B) converts an expression of the form Ay+B to one of the form 
A’j+B’. The macro modifies the values of A and B. 


a PMfxytofij(subscreen,A,B,C) converts an expression of the form Ax+By+C to one of the 
form A’i+B’j+C’. The macro modifies the values of A, B, and C. 
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Figure 2-4: LEDs on the pixel node boards 
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Each pixel machine has FLAG and READY signals that are connected to LEDs on the pixel node 
processor board (see Figure 2-5). FLAG is used by PMpsync() and READY is used by 
PMvsync() and PMrdyoff() . They can also be set by the user (whenever the sync routines are 
not required by the program) with the PMrdy_led() and PMflagled() routines in libpm. The sig- 
nals and LED displays can be useful for debugging to identify states in the program. 


Note that the LEDs are inverted logic. That is, when the signal is off, the light is on. When the 
signal is on, the light is off. 


2-22 DEVtools User’s Gulde, Version 1.0 





Writing Programs for the Pipe and Pixel Nodes 


Table 2-7: Pixel Node Functions 


Name 
PMapply 
PMclear 
PMcopy_f 
PMcopy_s 
PMcopy_v 
PMcopyftob 
PMcopyvtov 
PMcopyvtoz 
PMcopyztov 
PMcopyztoz 
PMdblbuff 
PMflagled 
PMfreezaddr 
PMfxtoi 
PMfxytoij 
PMfytoj 
PMgetcmd 
PMgetpix 
PMgetrow, PMgetcol 
PMgetscan 
PMgetzaddr 
PMgetzbuf 
PMgetzdesc 
PMihi 

PMilo 
PMinterleave 
PMjhi 

PMjlo 
PMmsg_ exchange 
PMmsg_ setup 
PMmyx 
PMmyy 
PMpagereg 
PMpixaddr 
PMpsync 
PMputpix 
PMputrow, PMputcol 
PMputscan 
PMputzbuf 
PMqcopyztoz 
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Function 


apply a function to all subscreens 

fill a rectangular region of the screen 

fast but dangerous 32 bit D/(VRAM copy 

safe 32 bit DRAM or VRAM copy 

32 bit copy with variable increments 

copy front to back 

copy blocks of VRAM 

copy video RAM to DRAM 

copy DRAM to video RAM 

copy from one section of DRAM to another 

enable double buffering mode 

turn the DEV_FLAG LED on or off 

decrement references to a page register 

map a linear function of x from screen space to processor space i 
map a linear function of x and y from screen space to processor space i and j 
map a linear function of y from screen space to processor space j 
read command from a pixel node FIFO 

read a pixel from the current buffer 

read a scanline from pixel memory without subscreens 

read a scanline from pixel memory 

assign address to a section of DRAM 

read a float value from the Z buffer 

allocate DRAM 

map from screen space(xmax) to processor space (ihi) 

map from screen space(xmin) to processor space (ilo) 

interleave or deinterleave a block 

map from screen space(ymax) to processor space (jhi) 

map from screen space(ymin) to processor space (jlo) 

send and receive data packet over serial links 

set serial DMA input pointer 

test if a given screen space coordinate is in processor space 

test if a given screen space coordinate is in processor space 
macros to manipulate page registers used to access video and Z memory 
generate a pointer to a specific pixel 

wait for all pixel processors to synchronize 

output a pixel to the current buffer 

read a scanline from pixel memory without subscreens 

write a scanline to pixel memory 

write a float value to the z-buffer 

copy from one section of DRAM to another 
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Table 2-7: Pixel Node Functions (continued ) 


PMaqget 
PMaqput 
PMqzget 
PMqzput 
PMrdyled 
PMrdyoff 
PMsiodir 
PMsioinit 
PMsngibuff 
PMswapbuff 
PMv0get 
PMv0put 
PMviget 
PMviput 
PMvsync 
PMxat 
PMyat 
PMzaddr 
PMzaddrcol 
PMzbrk 
PMzget 
PMzput 


quick read of a pixel from the current buffer 
quick write of a pixel to the current buffer 
quick read of Z value from the Z buffer 
quick write of Z value to the Z buffer 
turn the DEV_RDY_LED on or off 

turn the ready signal off 

set serial I/O link direction 

initialize serial I/O 

disable double buffering mode 

swap visible and pixel buffers 

read a pixel from buffer 0 

write a pixel to buffer 0 

read a pixel from buffer 1 

write a pixel to buffer 1 

synchronize and wait for vertical retrace 
map subscreen coordinates to screen space 
map subscreen coordinates to screen space 
generate a ZRAM pointer to a row 
generate a ZRAM pointer to a column 
initialize DRAM for allocation 

tread a float from the z-buffer 

write a float to the z-buffer 


TO eee 


Pixel Machine Math Functions 


This section describes the hand-optimized assembly language versions of a few mathematical sub- 
routines that are included in libpm (see Table 2-8). These routines have been implemented for the 
architecture and requirements of the Pixel Machine and will run more efficiently than similar rou- 
tines in the other DSP32 libraries. Some of these routines, however, are more restrictive than the 
other DSP32 libraries; please see the manual pages in the DEVtools Reference Manual for further 


information. 
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Table 2-8: Math Functions 


PMcos 
PMieee_dsp 
PMidot 
PMlong_dsp 


PMnorm 
PMpow 
PMsin 
PMsgart 
PMx_exp_n 
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cosine 

convert IEEE floating point number to DSP format 
specialized dot product for light sources 

convert long integer to float 

normalize a vector and return its length 

power function 

sine 

square root function 

integer power function 
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The DSP32 C Language Compiler Library Reference Manual describes three libraries that contain 
routines that can be included in Pixel Machine programs. 


The libc Library 


libe is a subset of the standard UNIX system C library and includes functions that support error 
handling and debugging. Table 2-8 lists the routines and gives a brief synopsis. 


libe is needed even if none of its functions are called explicitly because the compiler needs it for 
some operations, for example, cases, mod and integer divide. Some of the routines require header 
files, including math.h, memory.h, stdio.h and string.h. Refer to the individual manual pages in 
the DSP32 C Language Compiler Library Reference Manual for more information. 


Table 2-9: DSP32 libce 


ecvt convert a floating point number to a string 
isalnum, isalpha, classify characters 
isascii, iscntrl, 
isdigit, isgraph, 
islower, isprint, 
ispunct, isspace, 
isupper, isxdigit 


separate mantissa and exponent in a floating point number 
combine mantissa and exponent into a floating point number 
find first occurrence of a character in a block of memory 
memcpy copy a block of memory 
modf separate mantissa and exponent in a floating point number 
perror print system error messages 

(only for use with the d3sim simulator) 
printf print formatted output 

(only for use with the d3sim simulator) 

(the libpm version of printf should be used for Pixel Machine programs) 
strlen return the length of a string 
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libm provides all of the functions found in the standard UNIX system math library. Table 2-10 lists 
and briefly describes them. Note that these libraries do error checking and are not optimized for the 
DSP32; therefore they are rather slow. Whenever possible, use an alternative function from libap or 


libpm. 


To use the libm routines, include math.h and load the program with — lm. 


Table 2-10: DSP32 libm 


acos 
asin 

atan, atan2 
ceil 

cos, qcos 
erf, erfc 


exp 
fabs 


floor 
fmod 


gamma 
hypot 

j0, jl, jn 
log, log10 
matherr 
pow 

sin, sinh 
sqrt 

tan, tanh 
y0, yl, yn 
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arccosine 

arcsine 

arctangent 

ceiling function 
cosine 

error functions 
exponential 
absolute value 
floor function 
remainder function 
Gamma function 
Euclidian distance 
Bessel functions 
logarithms 

error handling 
power function 
sine 

square root 
tangent 

Bessel functions 
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The libap Library 


libap is a library of routines that have been written and optimized for the DSP32 processor, and 
includes functions for mathematics, matrix manipulation, filtering, and imaging. The routines are 
listed in Table 2-10. 


To use the libap routines, include libap.h and load the program with -lap. Some of the mathemat- 
ical functions appear in both the math library and the applications library. The routines in libap 
have been hand-optimized for the DSP32 and will run faster than the libm version. The DSP32C 
Language Compiler Library Reference Manual contains a section describing how to use both 
libraries in the same program. 
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acos, qacos 
alogi0, alog2, aloge 
asin, qasin 
atan, qatan 
cos, qcos 
div, divf 
dsp32 
ieee32 
inv, invf 
invsqr 
log10, qlog10, log2, loge 
ran, gran 
sin, qsin 
sqrt, qsqrt, sqrtf, sqrtq 
tan, qtan 
xtoy 
matin2, matinf, maninv 
matmul, mat2x2, mat3x3, 
mat4x1, mat4xlf, 
mat4x4, mat5x5 
fir, fir5, fire 
iir, iir2, iix3, iir4, 
iird, iirt, iirt1, 
iirt2, iirt3, iirt4 
lms, Imsc, lmsl 


fft 
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arccosine 

anti-logarithms 

arcsine 

arctangent 

cosine 

quotient 

convert from IEEE to DSP floating point format 
convert from DSP to IEEE floating point format 
inverse 

inverse of the square root 

logarithms 

random number generators 

sine 

square root 

tangent 

xy 

matrix inversion 

matrix multiplication 


finite impulse response filters 
infinite impulse response filters 


real adaptive FIR filters using 
least-mean-square algorithm 
fast fourier transform 


hamm, hamm0, hammi1, chamm0 | multiply by Hamming Window 
hann, hannO, hann1, chann0 multiply by Hanning Window 
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Host/Node Communication 


Introduction 


Most applications that run on the Pixel Machine must communicate with the host system in order to 
receive the data to be processed, to return results, or to perform operations, such as I/O, that the 
Pixel Machine processors cannot perform on their own. 


This section describes the ways in which host programs and Pixel Machine programs communicate. 
It is divided into sections that describe: 


™ communication to the Pixel Machine using the DEVtools command protocol 
™ communication from the Pixel Machine to the host using the message passing protocol 


m other communication using a user-defined protocol 
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The DEVtools Command Protocol 


A complete Pixel Machine program is one that uses all the architectural components of the Pixel 
Machine, and consists of: 


= acontrolling host program 
a DSP programs running in the pipe nodes 
a DSP programs running in the pixel nodes 


Host programs usually command the Pixel Machine by sending data using the DEVtools command 
protocol, which is a convention for passing data from the host to the pipe(s) in the Pixel Machine. 
The data is sent from the host to the first pipe node in the Pixel Machine in units called commands. 
The pipe nodes can modify, delete, or pass on the command packets unmodified to the next node. 
They can also generate new packets to be broadcast to the pixel nodes. 


Each command consists of opcode, an operand count, and the operands. The opcode and operand 
count are encoded into a single 32-bit value. The operands are 32-bit quantities that can be 
integers, host floating point values, or Pixel Machine floating point values. 
The format of commands on the host is: 

OPCODE COUNT ein 6 8 eens PARAM) count] 
Macros are provided to simplify the generation and processing of commands on the host. These 
macros are used to write commands to the Pixel Machine pipelines and read commands back from 
the feedback FIFO. These macros are defined in devcommand.h (see the DEVtools Reference 
Manual). The following is an example of the code required to generate a command: 


DEVcwrite2 (DEVcommand (opcode, 2), int, some_data, more_data); 


DEVcommand is used to encode an opcode and parameter count into a 32—bit command code. 
The command argument of the DEVcwrite macros is usually a call to DEVcommand. opcode is 
a user defined positive value. It is only important that the host and Pixel Machine routines agree on 
the meaning of the opcodes and the format of the operands that follow each opcode. The 2 in the 
DEVcommand macro is the number of operands that follow this command. This is frequently the 
same as the last character of the macro name, but it is not always the same, because multiple write 
macro invocations are required for commands that contain operands of more than one data type. int 
is the type of the operand to be passed to the Pixel Machine. some_data and more_data are expres- 
sions that are used as the values of the operands. 


Following is an example of a command that contains both integer and floating point operands: 


DEVewrite2 (DEVcommand (opcode, 4), float, x, y)? 
DEVwrite2 (int, i, j); 


To send an opcode cmd with no parameters use: 


DEVewrited (DEVcommand (cmd, 0)); 
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To send an opcode with one integer argument, argi, use: 
DEVewritel (DEVcommand(cmd,1), int, argi); 

To send an opcode with one float argument, argf, use: 
DEVewritel (DEVcommand(cmd,1), float, argf); 


There are separate macros to write from 0 to 9 parameters. If the number of parameters will not be 
known until run-time, use DEVewriten. For example: 


DEVewriten (DEVcommand (cmd, length), float, flt_array, length) ; 


If possible, it is best to use the individual macros because they are more efficient than DEVewri- 
ten. 


The DEVewrite0 through DEVewrite9 macros are used to write commands and a number of 
operands that match the last character of the macro name. The DEVwrite0 through DEVwrite9 
macros only write operands; they do not output a command code. 


DEVcommand_opcode and DEVcommand_length are used with the DEVreadn macros to 
extract the opcode and length from the encoded value when reading from the feedback FIFO. 


On systems with multiple pipes configured in parallel, the macros write to whichever of the pipes is 
the current pipe. Commands can be written to the alternate pipe by using the macros ending with 
the string alt. The _alt macros must not be used on single pipe systems or on multi—pipe systems 
whose pipes are configured in series. 


A few more details about the command formats (these are all taken care of by the macros): The 
DEVcommand macro turns the count into a negative byte count and packs it into one word 
together with the opcode. The byte ordering on the Sun and Pixel Machine are also different, so 
when sending bytes or 16—bit ints packed into a 32-bit parameter, it is necessary to do some swap- 
ping. The floating point format is also different, and the conversion must be done explicitly either 
on the host or in the nodes. It is usually more efficient to do the float conversion in the pipe nodes. 


From the point of view of the pipe nodes, the command packets are read one 32-bit word at a time 
from the input FIFO and possibly written to the output FIFO. In the nodes, a set of functions 
(PMgetcmd, PMgetdata, PMgetop, PMputcmd, PMputdata, PMputop (described below)) is 
provided for efficient reading and writing of the hardware FIFOs. All the FIFO routines use a data 
structure called PMcommand that holds the command packets. 


The PMcommand structure is defined in the pxm-.h file, and is as follows: 
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#include <pxm.h> 


typedef struct 
( 
short opcode; 


short count ; 
float *data_ptr; 
} PMcmdtype; 









exter PMemdtype PMcommand; 


The global data structure, PMcommand, defined in both the pipe and pixel node libraries, reflects 
this packet structure. The members of this structure have the following functionality: 


ms PMcommand.opcode: contains the user-defined opcode. 


ga PMcommand.count: contains the negated byte count of the parameters pointed to by the 
next field. 


m PMcommand.data_ptr: points to a static buffer containing the parameters. It is initialized 
by the system, although this can be changed to point to a user-defined buffer. The location 
of this buffer is specified to optimize the DSP32 data move instruction. 


Reading Commands from the Input FIFO 
Pipe node programs read a command in two steps: 


1. call PMgetop() to load an opcode and count from the input FIFO into the PMcommand 
structure. 


2. if the parameter count is nonzero, call PMgetdata() to load parameters from the input FIFO 
into the PMcommand structure. 


Pixel nodes read a command by calling PMgetcmd(), which loads all three components of the 
command into PMcommand. 


Writing Commands to the Output FIFO 
Pipe node programs can write a command in two ways: 
1. by calling PMputop() followed (if count is nonzero) by PMputdata(). 


2. by calling PMputcmd(), which combines the functionality of PMputop( and PMput- 
data(). 


By changing members of the PMcommand structure, a pipe node program can modify the com- 
mand stream as needed. Pixel node programs read commands from the pipe nodes but cannot write 
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commands. 


Data flows through the system in the following manner: the program on the host assembles com- 
mand packets which consist of an opcode, count, and data and sends them to the first pipe node via 
the first input FIFO. The pipe node (and subsequently the rest of the pipe nodes) reads the opcode 
and decides what to do with it. There are three scenarios: 


1. it could simply pass the command packet to the next node via the output FIFO 
2. it could do some processing and consume the command packet without passing it on 


3. it could do some processing of the data and then pass it on to the next node. It can use the 
same opcode or change it to another one. It could alter the data (e.g., convert IEEE format 
floats to DSP format) or change it entirely, even passing on several new command packets. 


After the last pipe node processes its commands, it writes the packets to its output FIFO which is 
broadcast to the input FIFOs of all the pixel nodes. Each pixel node can then read the packets and 
process the opcode and parameters according to its algorithm. 
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During the course of processing images on the Pixel Machine, it is often necessary to download an 
image from the host to the Pixel Machine (e.g., to display or perform some processing) or upload an 
image from the Pixel Machine to the host (e.g., to save a result). Because upload/download of a 
full image requires moving over 4Mbytes of information, it is desirable to accomplish this task as 
quickly as possible. 


Two DEVtools routines, DEVget_scan_line and DEVput_scan_line, have been designed to pro- 
vide users with fast and flexible image upload/download processing. Both routines take care of the 
pixel interleaving/de-interleaving. To process images quickly, both DEVget_scan_line and 
DEVput_scan_line require cooperating code to be executed on the Pixel Machine. To perform the 
upload/download, the host routines send system commands to the Pixel Machine. When a program 
executing on a pipe/pixel node receives a system command via PMgetop or PMgetcmd, it checks 
to see if the command was ‘‘enabled’”’ by PMenable. If the command was not enabled, the node 
takes no action and passes the command on to the next node (in the case of pipe nodes). If the 
command was enabled, the appropriate DEVtools routine is called to process the command. After 
the system command is processed, control is passed back to PMgetop or PMgetcmd to receive the 
next user command. 


Both DEVget_scan_line and DEVput_scan_line take a mode argument that specifies the format 
of an individual pixel and which portion of Pixel Machine memory the pixels should be 
uploaded/downloaded from/to. The following pixel formats are supported: 


m DEV_RGBA_PACKED PIXELS - on the host each pixel is 4 bytes long and the red pixel 
component is stored in the first byte (the byte at the lowest memory address). 


m DEV_RGB_PACKED_PIXELS - on the host each pixel is 3 bytes long and the red pixel 
component is stored in the first byte (the byte at the lowest memory address). When using 
DEV_RGB_PACKED_PIXELS in ZRAM, it is assumed that pixels are stored in RGBA for- 
mat, therefore upload uploads 3 bytes, skips one, uploads the next 3 bytes, etc. 


mw DEV_MONO _PIXELS - on the host each pixel is one byte long. When downloaded to 
VRAM, the pixel component is placed in each of the red, green, blue and alpha components. 
When uploaded/downloaded from/to ZRAM, pixels occupy consecutive bytes. 


m DEV_MONO_R_PIXELS — on the host each pixel is one byte long. When 
uploaded/downloaded from/to VRAM, the pixel component is placed in the red component of 
a pixel. Other pixel components are left untouched. ZRAM upload/download is not sup- 
ported. 


m DEV_MONO_G_PIXELS - on the host each pixel is one byte long. When 
uploaded/downloaded from/to VRAM, the pixel component is placed in the green component 
of a pixel. Other pixel components are left untouched. ZRAM upload/download is not sup- 
ported. 
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DEV_MONO B PIXELS - on the host each pixel is one byte long. When 
uploaded/downloaded from/to VRAM, the pixel component is placed in the blue component 
of a pixel. Other pixel components are left untouched. ZRAM upload/download is not sup- 
ported. 


DEV_MONO_A PIXELS - on the host each pixel is one byte long. When 
uploaded/downloaded from/to VRAM, the pixel component is placed in the alpha component 
of a pixel. Other pixel components are left untouched. ZRAM upload/download is not sup- 
ported. 


DEV_MONO_16_PIXELS — on the host each pixel is two bytes long. When 
uploaded/downloaded from/to ZRAM, each pixel occupies successive ints. VRAM 
upload/download is not supported. 


DEV_DSP_FLOAT_PIXELS - on the host each pixel is four bytes long. When 
uploaded/downloaded from/to ZRAM, each pixel occupies successive floats. VRAM 
upload/download is not supported. 


DEV_IEEE_FLOAT_PIXELS — on the host each pixel is four bytes long. When 
uploaded/downloaded from/to ZRAM, each pixel occupies successive floats. During the 
download operation, each pixel (float) is treated as an IEEE floating point number and con- 
verted to the DSP intemal floating point format. During the upload operation, each pixel 
(float) is treated as a DSP floating point number and converted to the IEEE floating point for- 
mat. VRAM upload/download is not supported. 


In addition to the above pixel formats, the mode argument also specifies the area in Pixel Machine 
memory to upload/download from/to: 


DEV_FRONT_BUFFER -~ the currently visible portion of VRAM. Typically used to 
display an image on the monitor. 


DEV_BACK_BUFFER — the currently non-visible portion of VRAM. Typically used to 
upload/download an image while another image is being displayed on the monitor. 


DEV_VRAMO_BUFFER -— the VRAM0 portion of VRAM (only available on models 932 
and above). Typically used to store an image that is larger than the size of the screen. Note 
that VRAM6O is the union of the FRONT and BACK buffers. 


DEV_VRAM1_BUFFER - the VRAMI portion of VRAM (only available on models 932 
and above). Typically used to store an image that is larger than the size of the screen. Note 
that VRAM1 is not directly visible. 


DEV_ZRAM_BUFFER - non-displayable dynamic RAM used typically for storing Z buffer 
values (ZRAM). Typically used to perform numerical calculations on image data. 


Using DEVtools 3-7 


Image Upload and Download 


Note that for all forms of VRAM, the subscreen concept is used, but for ZRAM upload/download 
the subscreen concept is not used. This allows for more efficient use of ZRAM when performing 
image processing. 


The following table gives the size (in pixels) of the largest image that can be stored in each of the 
above buffers: 


Model FRONT BACK 
[Model [FRONT [BACK | VRAMO_| VRANI | ZRAM 
92 1 1 


VRAMO VRAMI1 

oaaxi024 | Toasxio2a |_| _- | T2axi024 

[320 | 1280ni024 | 12e0x1024 |__| | 1380x1024 
940 

[964 | 204exi024 | 2048x1024 


Note that for ZRAM the numbers given in the above table should be multiplied by 2 for 
DEV_MONO_16_PIXELS and by 4 for DEV_MONO_PIXELS. 


When enabling reception of system commands using PMenable, the user has the choice of ena- 
bling upload/download for all memories, just VRAM or just ZRAM. Because program size on the 
pixel nodes is minimal, it is recommended that users enable the smallest piece that they need. If 
users run out of program space and still wish to perform image upload/download, the 
DEVget_pixel and DEVput_pixel routines can be used (albeit more slowly). 
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It is often necessary for the processors in the Pixel Machine to initiate communication back to the 
host system. This is required to perform tasks that can only be done on the host, such as input and 
output operations. 


The protocol used to send messages to the host has the following steps: 
m the node checks its semaphore to see if any previous operation has completed 
m the node loads a message code into the PIR; the semaphore is set 


m meanwhile, the host is polling the PIR registers of a designated set of pipe and/or pixel 
nodes. When a message code is found in a node’s PIR, it is used as an index into an array 
of function pointers initialized by calling DEVuser_msg_enable() 


@ message specific code is executed on the host to perform any other communication related to 
this message 


m if the message operation must complete before execution continues, then the node process 
must wait for the semaphore to be cleared by the host 


The message protocol is used for communications operations that are intemal to DEVtools (such as 
host communication required by the printf routine) as well as to implement user message routines. 
Message codes that are used internally by DEVtools are known as system messages. Message 
codes that are used for user defined functions are known as user messages. 


Functions are provided to send a message code to the host and to check to see if the semaphore has 
been cleared. The PMusermsg function checks to see if the semaphore is clear, and then sets the 
semaphore and loads the PIR. It has one argument, the message code (a positive integer from 1 to 
256) to be sent to the host. 


Upon receiving a message code, the host will perform the action requested and then clear the sema- 
phore. The Pixel Machine program can continue execution after sending the message code, or it 
may wait for the semaphore if its processing requires that the host action be completed before exe- 
cution can continue. 


In order for the host to serve the message requests, a process on the host must poll the Pixel 
Machine processors for pending messages. The polling processing is designed to be incorporated 
into a user’s program that runs on the host. In this way, the message serving functions can be com- 
bined with other host processing that may include other operations such as generating pipe com- 
mands using the command protocol described previously. 


DEVpoll_nodes is the function that polls the Pixel Machine processors. The user program may 
poll a single node, all nodes, or a range of nodes. Both pipe nodes and pixel nodes may be polied. 
The number of times that the processors are to be polled and the delay time between polls are argu- 
ments to DEVpoll_nodes. To poll continuously, the value DEV_FOREVER can be used. The 
delay time is used as a argument to the usleep() system call. If no delay is wanted, DEV_NONE 
should be used. DEV_NONE should only be used for host serving applications that need to be able 
to serve Pixel Machine requests very quickly, because the host process using DEV_NONE will con- 
sume as much CPU time as it can get. 
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For DEVpoll_nodes to recognize a user message code, that specific message code must be 
enabled. This is done by calling DEVuser_msg enable. For example: 


DEVuser_msg_ enable(1, pipe _function, pixel_function); 


DEVuser_ msg enable(2, NULL, pixel_func_2); 





In the first line of the example, user message code 1 is enabled. If message code 1 is received from 
a pipe node, a call to pipe_function is made. If message code 1 is received from a pixel node, a 
call to pixel_function is made. In the second line, if message code 2 is received from a pipe node, 
an error will be generated. If a message code is received for a code that has not been enabled, an 
error is also generated. pipe function, pixel_function, and pixel_func_2 represent user written 
routines that are called when the specified message code is received. The following is a sample of 
the declarations for a user-written message handler: 


int pipe function(opcode, pixel _system, node) 
int opcode; 

DEVpixel_system *pixel_system; 

int node; 


int pixel_function (opcode, pixel _system, node) 
int opcode; 

DEVpixel_system *pixel_system; 

int node; 





opcode is the user message code that caused the message handler to be called. This allows the mes- 
sage handler to know which code was received so that one function can be used to handle several 
message codes. pixel_system is a pointer to a system descriptor and is returned by DEVinit. node 
is the number of the node that sent the message code. It is possible to have a single function serve 
both the pipe and pixel nodes. 


Once a message code has been received, it is often necessary for the server routine to communicate 
with a processor to send or receive other information. Other communication may be performed 
using the low-level Pixel Machine control library functions. These functions provide routines that 
perform DMA I/O, read from the PIR, write using the PDR, and provide other monitoring and con- 
trol functions. 


Message serving routines may use any of the DEVtools routines to transfer data to and from the 
processor. The message server routine and the message sending routine must agree on how data is 
transferred, and how much data is transferred. If the sender and receiver get out of sync, it is possi- 
ble for the Pixel Machine or the host system to get caught in a loop waiting for more data, or for 
the host to attempt to interpret data from the Pixel Machine as message codes. 
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Message serving routines should not access the software semaphore since this is used by the mes- 
sage handling routines to indicate that a message operation has been completed. The semaphore is 
reset by DEVpoll_nodes after the return from the user message handler. 
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libpm contains a set of global data that is initialized by the startup code and is available for users. 
These variables should be treated as read-only by the user. Corrupting their values would have des- 
tructive effects on many of the library commands. 


The following variables are defined for both pipe and pixel node programs. They are all declared 
““extern’’ in pxm.h. Their values are set by hypload, DEVpixel_boot or DEVpipe_boot. 











int PMnode; /* node identification number [0-63] 
























int PMnx; /* number of drawing nodes in x [4,8,10] x] 
int PMny; /* number of drawing nodes in y [4,8] */ 
int PMox; /* drawing node's offset in x [0-7] */ 
int PMoy; /* drawing node’s offset in y [0-7] */ 
char PMsid[10]; /* software name {10 chars) */ 
int PMsem; /* software semaphore */ 
int PMmodel ; /* coded pixel machine model */ 
int PMvideo; /* video format code */ 
int PMpipe; /* pipe mode code */ 
int PMxmax; /* maximum x value in screen space */ 
int PMymax; /* maximum y value in screen space */ 














PMcmitype PMcommand; /* PMcommand struct with Opcode,Count and 
* DataPtr for reading and writing FIFO’s 


*/ 






The PMcommand structure is used by all the FIFO input and output routines in both pipe and 
pixel nodes, 


The following variables are defined only for pixel node programs. They should not be referenced 
from pipe node programs. If they are, their values will be undefined. Their values are set in the 
Startup code and depend on the configuration of the machine. 
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int PMimax; /* max pixels in I direction in processor space */ 
int PMjmax; /* max pixels in J direction in processor space*/ 
int PMmx; /* mora processing in x direction? boolean ef 
int PMmy; /* more processing in y direction? boolean */ 

/* 

* Table of Values for PMmx and PMny 

* 

* 

* [model | PMmx {PMmy | 

x SS I i— | 

* 1964 10 10 4 

® 1940 Pet fo | 

* 1932 p41 10 | 

* 1920 }1 12 | 

* 1916 11 fl of 

* 

*/ 
int PMnindex; /* total number of subscreens (2*PMmy+PMmxt1} */ 











PMsubsern *PMscrns[4}; /* initialized array of subscreen pointers */ 


Although there are four PMscrns, only the appropriate ones are initialized for a given model. 
PMscrns should only be used to pass it to an appropriate screen function. 
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Introduction to Subscreens 


As described in Chapter 1, each processor contains a portion of the frame buffer memory. For 
example, on a 64 processor system displaying a 1024x1024 image, each processor contains 128x128 
(16384) pixels of the frame buffer. On systems with fewer than 64 processors, each processor is 
responsible for a larger area of the frame buffer. A 32 processor system, for example, is responsible 
for 128x256 (32768) pixels, while a 16 processor system is responsible for 256x256 (65536) pixels. 


To provide a simple and uniform interface to the hardware that works for all system configurations, 
the concept of virtual nodes or subscreens was developed. Through the use of subscreens, each pro- 
cessor repeats a set of operations from one to four times, operating on a different ‘‘subscreen’’ or 
portion of the frame buffer each time. In essence, subscreens perform the function of several pro- 
cessors of a larger system. 


On a 64 processor system each processor contains a single subscreen. On smaller configurations 
each processor contains a number of subscreens such that the total number of subscreens is either 64 
or 80; 80 for 20 and 40 processor systems, and 64 for 16 and 32 processor systems, In other words, 
in 32 and 40 processor systems, each processor contains two subscreens; in 16 and 20 processor 
systems, each processor contains four subscreens. 


Where the Frame Buffers are Stored in VRAM 


For many DEVtools applications it is helpful to understand where pixels are located in memory, 
which memory areas are used by the frame buffer and which memory areas are available, and so on. 
The figures that follow illustrate which memory is used for frame buffer storage on each system 
configuration. All examples are for 1024x1024 images on 16 and 32 processor systems, and for 
1280x1024 for 20 and 40 processor systems. Examples of both 1280x1024 and 1024x1024 are pro- 
vided for 64 processor systems. 


Figure 3-1 illustrates the two 256k banks of VRAM found on each pixel node. Each bank consists 
of two planes, and each plane consists of a 256x256 array. Each element of the array contains two 
color components: the first plane contains the red and green values, while the second plane contains 
blue and overlay. 
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Figure 3-1: Pixel Nodes: Video Memory Organization 
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Figures 3-2-3-5 designate the portion of VRAM that is used to contain the frame buffer. The Pixel 
Machine supports double— buffering on all configurations. The buffer shown is the memory used in 
single buffer mode, or what is called the ‘‘top buffer’ in double—buffered mode. The second 
buffer, or ‘‘bottom buffer’’, is stored in the lower portion of the boxes shown in the figures and 
always begins at row number 128. The names “‘top’’ and ‘‘bottom”’ refer to the location of the 
buffer within VRAM, and should not be confused with ‘‘front’’ and ‘‘back’’ which refer to the 
buffer currently being displayed and updated, respectively. 





256 
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Figure 3-2 shows the top buffer of a 964 operating in 1024x1024 mode. Each buffer consists of a 
single subscreen containing 128x128 pixels. 


Figure 3-2: Frame Buffer Organization on a Model 964 


VRAMS VRAM1 
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Figure 3-3 shows the top buffer of a 964 operating in 1280x1024 mode. Each buffer consists of a 
single subscreen containing 160x128 pixels. 


Figure 3-3: Frame Buffer Organization on a Model 964X 
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Figure 3-4 shows the top buffer of a 940 operating in 1280x1024 mode, or a 932 operating in 
1024x1024 mode. Each buffer consists of two subscreens, each containing 128x128 pixels. 


Figure 3-4: Frame Buffer Organization on a Model 940/32 
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Figure 3-5 shows the top buffer of a 920 operating in 1280x1024 mode, or a 916 operating in 
1024x1024 mode. Each buffer consists of a four subscreens, each containing 128x128 pixels. 


Figure 3-5: Frame Buffer Organization on a Model! 920/16 
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Subscreen to Screen Mapping 


Once you know where the subscreens are in memory, you must understand how the pixels in a 
given subscreen correspond to the pixels on the screen. Figures 3-6-3-10 show, for each 
configuration, where the pixels for a given subscreen are displayed. 


Figure 3-6 shows the mapping for a 964. With only a single subscreen, the 964 is the simplest . 
case. Each processor displays every 8th pixel of every 8th scanline. 
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Figure 3-6: Processor to Screen Mapping on a Model 964 
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Figures 3-7 and 3-8 show the mappings for the 940 and 932, respectively. The 940 contains a 10x4 
array of processors, the 932 an 8x4 array. Each processor performs the function of two processors 
in the Y dimension, resulting in a 10x8 or 8x8 array of subscreens. 


Figure 3-7: Processor to Screen Mapping on a Modei 940 
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Figure 3-8: Processor to Screen Mapping on a Model 932 
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On a 940, each processor displays: 
Subscreen 0 every 10th pixel of every 8th scanline, beginning with scanline PMoy 
Subscreen 1 and every 10th pixel of every 8th scanline, beginning with scanline PMoy+4 


On a 932, each processor displays: 

Subscreen 0 every 8th pixel of every 8th scanline, beginning with scanline PMoy 

Subscreen 1 and every 8th pixel of every 8th scanline, beginning with scanline PMoy+4 
Figures 3-10 and 3-11 show the mappings for the 920 and 916, respectively. The 920 contains a 
5x4 array of processors, the 916 a 4x4 array. Each processor performs the function of two proces- 


sors in the X dimension and two processors in the Y dimension, for a total of four. The result is an 
array of 10x8 or 8x8 subscreens. 
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Figure 3-9: Processor to Screen Mapping on a Model 920 
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On a 920, each processor displays: 
Subscreen 0 every 10th pixel of every 8th scanline, beginning with pixel PMox of scanline 


PMoy 

Subscreen 1 and every 10th pixel of every 8th scanline, beginning with pixel PMox of scanline 
PMoy+4 

Subscreen 2 every 10th pixel of every 8th scanline, beginning with pixel PMox+5 of scanline 
PMoy 


Subscreen 3 and every 10th pixel of every 8th scanline, beginning with pixel PMox+5 of scan- 
line PMoy+4 
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Figure 3-10: Processor to Screen Mapping on a Model 916 
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On a 916, each processor displays: 


Subscreen 0 every 8th pixel of every 8th scanline, beginning with pixel PMox of scanline 
PMoy 


Subscreen 1 and every 8th pixel of every 8th scanline, beginning with pixel PMox of scanline 
PMoy+4 
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Subscreen 2 every 8th pixel of every 8th scanline, beginning with pixel PMox+4 of scanline 
PMoy 


Subscreen 3 and every 8th pixel of every 8th scanline, beginning with pixel PMox+4 of scan- 
line PMoy+4 
Subscreens in Z Memory 


When Z memory is being used to store pixel-related data, such as Z—buffer values, it is necessary 
to divide the Z memory into subscreens. Figure 3-12 shows the Z memory subscreen mapping used 
by such DEVtools functions as PMputzbuf(). 
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Figure 3-11: Z-Buffer Mapping on a Model 916/920 
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Programming with Subscreens 


Program code can be broken into three classes: 
™ subscreen independent code — code that must be executed only once 
® subscreen dependent code — code that must be executed once for each subscreen 


m code that functions correctly whether executed once or once for each subscreen 


Subscreen independent code includes functions such as reading data from an input FIFO, swapping 
buffers with PMswapbuff(), and initializing Z memory using PMzbrk(). These are functions that 
should not be repeated for each subscreen. 


Subscreen dependent code includes computing the processor space coordinates using the PMilo0 
and PMihi() macros, and updating the frame buffer using functions such asPMclear(), and 
PMputpix(). All functions that take a subscreen argument (e.g., PMiloQ, PMihi()) are subscreen 
dependent. 


Programs typically consist of a subscreen independent function that reads data from the input FIFO, 
performs some processing and then calls a subscreen dependent function N times, once for each 
subscreen. The subscreen dependent function receives as an argument a pointer to a structure that 
describes the subscreen to be processed. 


Code that functions correctly as either subscreen dependent or independent could include part of an 
application that performs calculations on data not directly related to drawing pixels on the screen. 


PMsubscrn is the type name of the structure used to describe a subscreen. The fields contained in 
each subscreen structure are: 


Nx number of subscreens or virtual nodes in the X dimension 

Ny number of subscreens or virtual nodes in the Y dimension 

Ox offset of this subscreen in the X dimension 

Oy offset of this subscreen in the Y dimension 

ifix byte offset from the beginning of the VRAM row of this subscreen 

ifix specifies whether the subscreen is in VRAM 0 or VRAM 1 and whether it is in 


the top buffer or bottom buffer 


All of these values are stored as floating point values. 


The ifix and jfix values can be useful when determining the location of a subscreen if page registers 
are being used to update the frame buffer. The subscreen structure also contains values that are 
used to compute the PMilo(), PMihi(), PMjlo(), and PMjhi() functions. 
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DEVtools includes a simple facility that can be used to call a subscreen dependent function the 
appropriate number of times with the proper subscreen structure pointer. This is done by using the 
PMapply() function. PMapply() calls a specified function and passes, as the first argument, the 
subscreen pointer. For example the statement: 


PMapply(PMclear, 0, 0, PMimax, PMjmax, &color); 





is functionally equivalent to: 


for (i = 0; 4 < PMnindex; i++) { 
PMclear (PMscrns{iJ, 0, 0, PMimax, PMjmax, &color); 
} 





If code is written to run on a specific model, the PMscrns array can be used to access the pointer 
for a specified subscreen. Code written exclusively for a 964, for example, could be written as: 


PMclear(PMserns[0}, 0, 0, PMimax, PMjmax, &color); 





The global variables PMmx and PMmy can be used to determine the number of subscreens in each 
dimension. When PMmnx is true, it means that there is more than one subscreen for each scanline 
X. PMonv is true in high-resolution mode for all systems except the 964. When PMmy 
is true it means that there is more than one subscreen for each scan column Y. PMmy is true in 
high-resolution mode for the 916 and 920. 


Subscreens and Video Formats 


NTSC uses only a single subscreen on all configurations. PAL uses the same number of subscreens 
as high-resolution, however, each subscreen is smaller. 
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Accessing Memory Without Subscreens 


Images are sometimes stored in memory without using subscreens. This can simplify some image 
manipulation tasks when the data is not required to be in a displayable format while being pro- 
cessed. For example, an image copied into Z memory using the PMcopyvtoz() function is not 
stored in subscreen format in Z memory. A special subscreen structure is provided to allow func- 
tions such as PMiloQ andPMihi() to be used with these images. The global variable PMrealscrn 
can be used to access this structure. In the PMrealscrn structure, the physical node counts and 
offsets are always equal to the virtual values. 
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There are several different types of memory associated with each pixel node. These can be divided 
into three groups: Static Memory (SRAM), Video Memory (VRAM), and Dynamic Memory 
(DRAM). SRAM includes 1kx32-bit of on chip memory and 8kx32-bit off chip memory, which 
are both available for program storage. Additionally, there is memory for the frame buffer and 
pixel data (usually for z-depth information). This additional memory consists of two banks of 
256x256x32~bit VRAM, referred to as VRAMO and VRAM1, and one bank of 256x256x32-bit 
DRAM used for the z-buffer, which is also referred to as ZRAM. 


Figure 3-12: Pixel organization of the rgba (video) and z (dynamic) memories 
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VRAM and ZRAM Access 


Because of the way that non—program memory is organized, special low level functions and macros 
are needed to access it. These functions are needed because access to this memory involves using 
page registers. In addition, each bank of pixel memory (VRAM) is actually composed of two 
planes, an RG (red/green) plane and BO (blue/overlay) plane that are accessed separately. Also, 
only 8—bits out of each 16 are actually used. For additional information, see the section “Pixel 
Nodes’’ in Chapter 1 of this guide. 


Following is a list of library routines to help access memory correctly: 





Table 3-1: VRAM Access 


















PMgetscan() _ read a scanline from video memory 
PMputscan() write a scanline to video memory 
PMpixaddr() _ generate a pointer to a specific pixel 
PMgetpix() read a pixel from the specified subscreen 
PMputpix() write a pixel to the specified subscreen 
PMv0get() read a pixel from video buffer 0 
PMv0put() write a pixel to video buffer 0 

PMviget() read a pixel from video buffer 1 
PMviput() write a pixel to video buffer 1 

PMaget() quick read of a pixel from the current buffer 
PMgput() quick write of a pixel to the current buffer 











Table 3-2: ZRAM Access 


Routine Function 


PMgetzbuf() _read a float value from the Z-buffer 
PMputzbuf() write a float value to the Z-buffer 
PMzget() read a float from the Z-buffer 
PMzput0 write a float to the Z-buffer 
PMqzget() quick read of Z value from the Z buffer 
PMqzput() quick write of Z value to the Z buffer 
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There are two types of functions that access pixels: those that use subscreens and those that do not 
use subscreens. For example, when you are rendering, you would want to access pixels with refer- 
ence to where they map to the screen. In this case you would want to use subscreens, but if you 
did not care about the mapping, you could use the direct functions. 


When accessing data row by row, you can substantially increase the efficiency by careful use of the 
“‘quick’’ routines, after setting up the pointer and page register by a call to the appropriate function. 
For example, to update an entire scanline, use PMpixaddr() to generate a pointer to the first pixel 
on the line: 


dptr = PMpixaddr(scrn, 0, j); 


Besides returning the pointer PMpixaddr() and the other memory functions, all of these routines 


also have the effect of setting up the appropriate mapping registers. Next use dptr as an argument 
to PMqput: 


dptr = PMqput (color, dptr); 


PMgqput will also return a pointer to the next element. It is safe to use the q routines up to 
PMimax times before reaching the end of the subscreen boundary, at which time a new pointer 
needs to be generated. 


The same applies for ZRAM. However, instead of using PMqzget or PMqzput the pointer can be 
used directly because it points to real 32-bit memory (except that the access is slower than SRAM). 
The Z pointer can be incremented up to PMimax times, or, if you are not using subscreens, up to 
256 times before the page registers have to be updated by a call to one of the other z routines. 


Using Z Memory As General Purpose Memory 


Z memory can be used for both Z values used in displaying images and as general purpose memory 
for data storage. A number of functions have been provided to facilitate the use of Z memory as 
general purpose memory. These functions correspond, somewhat, to the malloc function available 
on most UNIX systems, but their use is more complex. The additional complexity is due to the 
limitations imposed by page registers. These functions hide most of the details of page register 
usage, but still impose some responsibility on the user. Thus programs that used malloc on a 
UNIX system based processor will require some modifications. 


The functions used to manage Z memory are: 
= PMzbrk() — used to reserve the general purpose Z memory pool. 


m PMgetzdesc() — gets a block of Z memory of the requested size, and returns addressing 
information. The memory is still not accessible by the program. 
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a PMgetzaddr() —- makes the memory accessible to the program and retums the address at 
which it can be accessed. 


mw PMfreezaddr() — frees the address space to be used for another block of memory. 


PMzbrk() is called first to initialize the values that will be used by the other functions. Its only 
argument is the number of ZRAM rows to devote to allocation by these functions. The memory is 
allocated from high numbers down, so that, if viewed as rows, PMzbrk(I) would set aside row 255. 
The memory not allocated by PMzbrk() is still available for other purposes. 


After a large chunk of memory is reserved by PMzbrk(), the memory is subdivided and allocated 
with PMgetzdesc(). PMgetzdesc() is called with the number of bytes desired. It rounds up to 
the nearest 4 byte boundary to make sure that the next block of memory is properly aligned for 
floats or other such data. It updates a private data structure to give the location of the next unallo- 
cated block of memory, and returns a PMzdesc structure that contains the information on the loca- 
tion of this block of memory. This return should be checked and its value retained. If a number of 
memory blocks are to be allocated in a program, that is, if PMgetzdesc() is called repeatedly, it is 
probably a good idea to create an array of type PMzdesc and keep the returns in that array. Once a 
block of memory has been allocated it cannot be returned to the memory pool. In most cases, the 
best way to use PMzdesc is to set up a set of buffers, that are re-used, as opposed to the usual way 
that malloc is used, allocating, freeing, and reallocating. 


PMgetzdesc() locates available Z memory but does not make it accessible by a program. This 
requires the setting of page registers and the determination of the correct pointer of the memory 
block given the page register used. This is the purpose of PMgetzaddr0. PMgetzaddr() searches 
the list of page registers reserved for its use (as explained below) to find one which is not in use for 
some other purpose. Once one is found, it is loaded with the information from PMgetzdesc(), so 
that the processor can address the DRAM. The address is then calculated, so that the program can 
address that memory. The return from PMgetzaddr() should also be checked, because, even if 
there is enough memory, there may not be any available page registers. 


Page registers are a limited resource, and some functions require the use of certain ones. The fol- 
lowing table shows the allocation of page registers to functions in libpm.a: 
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Table 3-3: Page Reglster Assignments 


Page Register Function 


PMgetscan(), PMputscan(), PMclear(), PMgetcol0, PMputcol(), 
PMgetrow(), PMputrow() 

PMgetscan(), PMputscan(), PMclear(), PMgetcol0, PMputcol(), 
PMgetrow(), PMputrow() 

PMvOget(), PMgetpix(), PMgetcol(), PMputcol(), PMgetrow(), PMputrow() 
PMv0get(), PMgetpix(), PMgetcol(), PMputcol(), PMgetrow(), PMputrow() 
PMv0put0, PMputpix() 

PMv0put0, PMputpix0 

PMviget() 

PMviget() 

PMviput0 

PMviput0 

PMpixaddr() 

PMpixaddr() 

PMzget(), PMgetzbuf(), PMzaddr() 

PMzput(), PMputzbuf(), PMzaddrcol() 

Reserved for host use 

Reserved for host use 





It is unlikely that all of these functions will be used in a given program. It is likely that a set of 
these functions will be used with the Z memory allocation functions. The Z memory allocation rou- 
tines have been designed to be flexible in the use of page registers. There is an array that maintains 
the busy status of each page register. Page registers can be made available for Z memory alloca- 
tion, or set aside for use by the routines listed in the table, by the use of macros. The macro 
PMblock_reg() sets aside a page register, so it will not be used by the Z memory allocation rou- 
tines. It takes as its argument the number of the page register. PMblock_reg() puts a non-zero 
value into the busy status array element for that page register. The opposite function is served by 
the macro PMavail_reg(), which puts a zero into that element. Because external memory is not 
always cleared when a program is restarted, it is a good idea to explicitly set the status of every 
page register before the Z memory allocation functions are used. The macro PMset_lowreg() is 
provided to reduce unnecessary searching through reserved page registers. Normally, the page regis- 
ter status array is searched from 0 to 13. If, say, only 12 and 13 are available, this is extra work for 
the machine. PMset_lowreg() can be used to restrict the search by setting the low register to 12. 
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After PMgetzaddr() retums, the segment of Z memory is in the address space of the program. 
The pointer can be used as any pointer would be used. When this memory is not in use, other Z 
memory may be used and PMgetzaddr( called, the page register should be freed. This is done 
with PMfreezaddr(). PMfreezaddr() does not free Z memory or modify the contents of Z 
memory. It only makes a page register available for use elsewhere. When memory needs to be 
reaccessed, the function PMgetzaddr() should be called again using the Z descriptor for that 
memory. The address may be different, but the contents will not be changed. 


The procedures can be illustrated by the following code fragment: 
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#define NULL (char *)0 
#define MAXDESC 21 


main () 
{ 


#include "pxm.h* 
#include “pageregs.h* 












































PMzdesc desc {MAXDESC]; 

int i, 37 

char *ptrvall; 

char *tptrl; 

char *tptr4; 

char *msgptr = "Got memory on pass"; 


PMzbrk (5); /* make 5k usable for general purpose */ 


/* set limits on page registers */ 
PMset_lowreg (10) ; 
PMset_hireg(13); 


for (4 = 0313 3++) { /* loop until error */ 

desc[j] = PMgetzdesc (256) ; 

if ( !PMzdesc valid(desc{j]) ) { /* no memory left */ 
printf ("No DRAM memory available, pass %d0, 4); 
break; 

} 

if ( (ptrvall = PMgatzaddr(desc(4})) == NOLL ) { 
/* no pointers (1,e, page registers) left */ 
printf ("No Page Registers available, pass %d0, 4}; 
break; 

} 


/* copy string to to the allocated mam in 2RAM */ 

/* (this is generally not an efficient copy on DSP's) */ 

for { i = 0, tptrl = ptrvall , tptr4 = msgptr; 1 < 19 7; i++) { 
*tptrl++ = *tptr4++; 

} 

/* print string from 2RAM */ 

printf ("%s %d0, ptrvall, 4); 

/* block till printf is done so we don’t change page reg before 

davprint neads to read the string */ 


PMwaitsem() ; 
/* comment out next call and reuse desc’s to get no page registers avail */ 
PMfreezaddr (ptrval1) ; /* make pointers available */ 


} 


PMhost_exit{}; 
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A Pixel Machine program using SIO progresses through three stages: initialization, setting link 
direction, and exchanging data. Each stage is explained in detail below, and a short example pro- 
gram demonstrating SIO is provided at the end of this section. 


Initialization 


Every pixel node program using SIO must call the DEVtools routine PMsioinit before any other 
SIO routines are called. PMsioinit initializes the SIO hardware and must be called only once in 


each program. 


Setting Link Direction 


To change the SIO link direction, call the DEVtools routine PMsiodir. This routine takes one 
parameter specifying the direction, which must be one of: 


PM_MSG_SERIAL_NORTH 
PM_MSG_SERIAL_SOUTH 
PM_MSG_SERIAL_EAST 
PM_MSG_SERIAL_WEST 


These constants are defined in the header file sysmsg.h, and must be #included before PMsiodir 
is called. 


PMsiodir requires that a host server process such as devprint (described earlier in this document), 
or a user program calling the host devlib library, be running on the host machine. PMsiodir sends 
a message to the server process requesting that a call to the host library routine 
DEVserial_direction be made to actually change the link direction. 


All nodes must call PMsiodir. 


Exchanging Data 
After SIO is initialized and the link direction set, data packets may be exchanged with neighboring 


nodes. All nodes must transmit simultaneously and send exactly the same amount of data as all 
other nodes. The sequence of DEVtools calls to exchange a data packet is: 
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float *inbuf, *outbuf; 
short size; 


PMmsg_setup (inbuf) ; 
PMpsyne () 7 
PMmsg_exchange (inbuf, outbuf, size); 





The variables outbuf and inbuf are pointers to an output buffer, whose contents are sent in the link 
direction, and an input buffer which receives data from the opposite direction. 


The call to PMmsg_ setup sets up the SIO hardware to do DMA input to inbuf. Next, the 
PMpsync call ensures that all processors are synchronized and have set up their input buffers. 
Finally, PMmsg_ exchange is called to exchange data packets. It sends size floats from outbuf and 
then waits until size floats have been received into inbuf. 


Example Program 


The short program below uses SIO to send a data packet from each node to its west neighbor, then 
sends the received data packet back to its starting point. 
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A Pixel Machine program using SIO progresses through three stages: initialization, setting link 
direction, and exchanging data. Each stage is explained in detail below, and a short example pro- 
gram demonstrating SIO is provided at the end of this section. 


Initialization 


Every pixel node program using SIO must call the DEVtools routine PMsioinit before any other 
SIO routines are called. PMsioinit initializes the SIO hardware and must be called only once in 


each program. 


Setting Link Direction 


To change the SIO link direction, call the DEVtools routine PMsiodir. This routine takes one 
parameter specifying the direction, which must be one of: 


PM_MSG_SERIAL_NORTH 
PM_MSG_SERIAL_SOUTH 
PM_MSG_SERIAL_EAST 
PM_MSG_SERIAL_WEST 


These constants are defined in the header file sysmsg.h, and must be #included before PMsiodir 
is called. 


PMsiodir requires that a host server process such as devprint (described earlier in this document), 
or a user program calling the host devlib library, be running on the host machine. PMsiodir sends 
a message to the server process requesting that a call to the host library routine 
DEVserial_direction be made to actually change the link direction. 


All nodes must call PMsiodir. 


Exchanging Data 
After SIO is initialized and the link direction set, data packets may be exchanged with neighboring 


nodes. All nodes must transmit simultaneously and send exactly the same amount of data as ail 
other nodes. The sequence of DEVtools calls to exchange a data packet is: 
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float *inbuf, *outbuf; 
short size; 


PMmsg_setup(inbuf) ; 
PMpsynce {) ; 
PMmsg_exchange{inbuf, outbuf, size); 





The variables outbuf and inbuf are pointers to an output buffer, whose contents are sent in the link 
direction, and an input buffer which receives data from the opposite direction. 


The call to PMmsg_setup sets up the SIO hardware to do DMA input to inbuf. Next, the 
PMpsync call ensures that all processors are synchronized and have set up their input buffers. 
Finally, PMmsg_exchange is called to exchange data packets. It sends size floats from outbuf and 
then waits until size floats have been received into inbuf. 


Example Program 


The short program below uses SIO to send a data packet from each node to its west neighbor, then 
sends the received data packet back to its starting point. 
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Figure 3-13: SIO sample program 


#include <pxm.h> 
#include <sysmsg.h> 


#define SIZE 100 


main() { 


float inbuf(SIZE], outbuf [SIZE]; 
short i; 


/* Fill the data packet with a sequence of numbers */ 
for (i = 07 i < SIZE; i++) 

outbuf{i] = PMnode * i; 
PMsioinit(): /* Initialize sic */ 


/* Set link direction */ 
PMsiodir(PM_MSG SERIAL WEST) ; 


/* Send outbuf to the West, receive inbuf from the East. 
PMmsg_setup (inbuf) ; 

PMpsync (); 

PMmsg_exchange (inbuf, outbuf, SIZE) 7 


/* Reverse link direction */ 
PMsiodixr (PM MSG SERIAL EAST) ; 


/* Send inbuf to the East, receive outbuf from the West. */ 
PMmsg_setup (outbuf) ; 

PMpsyne (} 7 

PMmsg_exchange (outbuf, inbuf, SIZE); 
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Introduction 


It is often necessary for the pixel nodes to synchronize themselves to ensure they have all reached 
the same stage of a computation before continuing to compute. An example is making sure that all 
processors have finished rendering their portion of a frame into the back buffer before switching it 
to become the visible buffer. 


Hardware features of the Pixel Machine support this synchronization; altematively, synchronization 
may be done in software with the aid of the host workstation. Both approaches are described in this 
section. 


Hardware Synchronization 


The basic DEVtools synchronization routine is named PMpsync. When called, this routine does 
not return until all pixel nodes have called PMpsync. This form of synchronization has very low 
overhead, returning within twelve instruction cycles of synchronization. Because PMpsync is so 
efficient, it is heavily used in internode communications routines. 


The PMvsync DEVtools routine is similar to PMpsync. PMvsync is used to accomplish syn- 
chronization for tasks that change the displayed image; for example, before switching buffers in 
double-buffer mode. Like PMpsync, PMvsync waits until all processors have called it. It then 
waits for the beginning of a vertical blanking interval - when the electron beam of the monitor 
has reached the beginning of a field. PMvsync retums within twelve instruction cycles of the 
interval. It may take up to 1/60th second to retum, depending on the position of the electron beam 
when the routine is called. 


Since there is very little time between the start of the blanking interval and the time pixels in a 
frame begin to be displayed, PMvsync retums as soon as the interval is detected. The hardware 
signal used for synchronization must be tumed off before another call to PMvsync is made, using 
another DEVtools routine PMrdyoff. 


In summary, PMvsync is used as follows: 
PMvsync(); /* Return after start of blanking interval */ 
(swap buffers or similar task) 
PMrdyoff0; /* Disable VSYNC signal for the next call */ 


Note that in the normal case, PMswapbuff() should be used if the you intend to swap the visible 
buffer. PMvsyne() and PMrdyoff() should only be used when synchronization with the vertical 
retrace is desired for some other purpose. 
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Software Synchronization 


Another method of synchronizing processors requires the aid of the host workstation. The host 
polls the nodes, waiting for a software condition to be established. An example would be waiting 
for nodes to set their software semaphores to a specific value. After this condition is established 
and the work associated with it accomplished, the host establishes a different software condition in 
the nodes enabling them to proceed. This form of synchronization requires that the host and nodes 
agree upon the software protocol to be used. Many protocols may be used; below is a simple exam- 
ple using the software semaphore: 


Host action Node action 


Wait for semaphore == 1 in all nodes 


Set semaphore = 1 
Take action upon synchronization 
Wait for semaphore == 0 





Set semaphore = 0 in all nodes 


Synchronization Signals and LEDs 


The hardware mechanism used by PMpsync and PMvsync is visible in the form of LEDs on the 
Pixel Array Processor boards. The strip of 8 red LEDs on each board contains 2 LEDs for each 
pixel node. The upper 4 LEDs show the state of the PMpsync signal in each node. The lower 4 
LEDs show the state of the PMvsync signal in each node: 


Using DEVtools 3-43 


Pixel Node Synchronization 


Figure 3-14: LED layout on pixel node boards 


psync - processor 3 
psync - processor 2 
psync - processor 1 


psynic - processor 0 


vsynic - processor 3 
vsynic - processor 2 
vsync - processor 1 


vsync - processor 0 


Gees ee ee 





These LEDs may be used for other purposes (such as a debugging aid) if the pixel node program in 
question makes no use of the corresponding synchronization calls (including calls to other 
DEVtools routines that require synchronization). 


If no use is made of PMpsync, the upper LEDs may be turned on and off explicitly with the call 
PMflagled. This takes one integer argument. If nonzero, the LED is tumed on, otherwise it is 
turned off. : 


Similarly, if no use is made of PMvsync, the lower LEDs may be tumed on and off explicitly with 
the call PMrdyled, which takes the same on/off parameter as PMflagled. 
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Software Synchronization 


Another method of synchronizing processors requires the aid of the host workstation. The host 
polls the nodes, waiting for a software condition to be established. An example would be waiting 
for nodes to set their software semaphores to a specific value. After this condition is established 
and the work associated with it accomplished, the host establishes a different software condition in 
the nodes enabling them to proceed. This form of synchronization requires that the host and nodes 
agree upon the software protocol to be used. Many protocols may be used; below is a simple exam- 
ple using the software semaphore: 


Host action Node action 


Wait for semaphore == 1 in all nodes 


Set semaphore = 1 
Take action upon synchronization 
Wait for semaphore = 0 





Set semaphore = 0 in all nodes 


Synchronization Signals and LEDs 


The hardware mechanism used by PMpsync and PMvsync is visible in the form of LEDs on the 
Pixel Array Processor boards. The strip of 8 red LEDs on each board contains 2 LEDs for each 
pixel node. The upper 4 LEDs show the state of the PMpsync signal in each node. The lower 4 
LEDs show the state of the PMvsync signal in each node: 
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Figure 3-14: LED layout on pixel node boards 
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These LEDs may be used for other purposes (such as a debugging aid) if the pixel node program in 
question makes no use of the corresponding synchronization calls (including calls to other 
DEVtools routines that require synchronization). 


If no use is made of PMpsync, the upper LEDs may be turned on and off explicitly with the call 
PMflagled. This takes one integer argument. If nonzero, the LED is tumed on, otherwise it is 
tured off. : 


Similarly, if no use is made of PMvsync, the lower LEDs may be tured on and off explicitly with 
the call PMrdyled, which takes the same on/off parameter as PMflagled. 
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Figure 3-15: Model 916 
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Figure 3-16: Model 920 
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Figure 3-17: Model 932 
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Figure 3-17: Model 932 
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Figure 3-18: Model 940 
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Figure 3-19: Model 964 
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The skeleton directory in /usr/hyper/devtools/sample/skeleton contains a sample of a complete 
Pixel Machine program, which means that all the architectural components of the Pixel Machine are 
used. 


Sample Skeleton Program 


The skeleton program is a sample of how to use command passing through the system. The main 
program boots the pipe and pixel nodes with their corresponding programs and starts the Pixel 
Machine running, and then enters the main loop. In the first part of the loop the host sends down 
commands to alternately clear the screen to red and then blue while flashing the FLAG LEDs. A 
delay command is also sent down between colors; the delay is shorter as the loop progresses 
towards the end. 


In the next part of the loop, the host loops while sending down a random rgb color value and then a 
command to draw a random rectangle of that color. At the end of the main loop, the host sends 
down a single command, the DEV_GENERATE opcode. This opcode instructs a pipe node to gen- 
erate many random rectangles on its own. 


Finally, after the main loop exits, the host sends the pixel nodes a command to clean up and exit. It 
then calls DEVpoll_nodes() to wait for an exit code from the pixel nodes before exiting. 
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Introduction 


Debugging programs that run on parallel computers is a task that strikes fear into the hearts of most 
programmers. Debugging code on the Pixel Machine, however, is much simpler than may be 
expected, and, in fact, is not much different than debugging on an ordinary computer. This is true 
because, in most cases, it is possible to debug a program on a single processor without worrying 
about what all of the other processors are doing. On the Pixel Machine this is possible because the 
processors do not share any of their memory with other processors. With the exceptions of serial 
DMA 1/O with neighboring processors and parallel DMA I/O with the host, a single processor is in 
complete control of its environment. 


Tools for General Debugging 


Following are the tools available for general debugging: 


@ printf: the libpm library provides a version of printf that can be used to display data during 
the execution of a program on the Pixel Machine. The data is directed to the standard output 
of the controlling program running on the host. Print statements can then be used to display 
information as you would do on a conventional system. 


m User Messages: using the message handling routines provided by devlib, a Pixel Machine 
program can send messages to the host indicating what the Pixel Machine program is doing, 
providing values of variables, etc. The host program can than check the sequence of the 
events, the values of the variables, etc. 


u hypeek and hypoke: these commands allow you to display and modify data in a node’s 
memory. They are useful for examining the data of a running process. 


= d3sim: is the general purpose DSP simulator provided with the DSP Tools to simulate and 
debug programs written in DSP32. It can be used to debug Pixel Machine programs, except 
that it does not model any of the Pixel machine specific components such as FIFOs and 
frame buffers. For more information, see Chapter 6 of the WE®DSP32 and DSP32C Sup- 
port Software Manual. 


Tools for Debugging Pipe Routines 


Programs that run in the pipe nodes usually perform operations similar to a UNIX system filter. 
They read input from the FIFO, perform some transformations, and output to the FIFO of the next 
processor. It is often useful to display the data that is the input to a given processor, and then to 
display the output of that processor (that is the input to the next processor). The program 
/asr/hyper/boot/pipe_fb.dsp can be loaded into a pipe node. It reads input from its FIFO and 
transfers the data to a host program through the PIR. A host program, called hypfb, can then read 
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the PIR data and display it. 


For example, if you wanted to display the data that pipe node 0 is receiving as input, you would 
load pipe_fb.dsp into pipe node 0, run the host program that sends commands to the pipe, and run 
the program hypfb on the host. This would display all of the commands from the host as received 
by node 0. 


If you wanted to display the output of node 0, you would load pipe_fb.dsp into pipe node 1, run 
the host program that sends the commands and run hypfb. This would display the output of node 
0, that is also the input to node 1. 


If the program you are debugging accepts commands in the format supported by the devlib com- 
mand macros and functions, the hypemd command can be used to translate the output of hypfb 
into a more readable format. To use hypcmd, simply pipe the output of hypfb into hypemd. 
The commands: 


hypload -g0 SHYPER_PATH/boot/pipe fb.dsp 
hyprmm -g0 


host_program 
hypfb -~g0 | hypamnd 





will read the feedback information from pipe node 0 and translate it into command format. 
host_program must be run in the background or from a different window, because it and hypfb 
need to run at the same time. 
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